Voice data processing device and processing method

ABSTRACT

There is disclosed a speech processing device in which prediction taps for finding prediction values of the speech of high sound quality are extracted from the synthesized sound obtained on affording linear prediction coefficients and residual signals, generated from a preset code, to a speech synthesis filter, speech of high sound quality being higher in sound quality than the synthesized sound, and in which the prediction taps are used along with preset tap coefficients to perform preset predictive calculations to find the prediction values of the speech of high sound quality. The speech of high sound quality is higher in sound quality than the synthesized sound. The device includes a prediction tap extracting unit ( 45 ) for extracting, from the synthesized sound, the prediction taps used for predicting the speech of high sound quality, as target speech, the prediction values of which are to be found, and a class tap extraction unit ( 46 ) for extracting class taps, used for classifying the target speech to one of a plurality of classes, from the above code. The device also includes a classification unit ( 47 ) for finding the class of the target speech based on the class taps, acquisition unit for acquiring the tap coefficients associated with the class of the target speech from among the tap coefficients as found on learning from class to class, and a prediction unit ( 49 ) for finding the prediction values of the target speech using the prediction taps and the tap coefficients associated with the class of the target speech.

TECHNICAL FIELD

[0001] This invention relates to a method and an apparatus forprocessing data, a method and an apparatus for learning and a recordingmedium. More particularly, it relates to a method and an apparatus forprocessing data, a method and an apparatus for learning and a recordingmedium according to which the speech coded in accordance with the CELP(code excited linear prediction coding) system can be decoded to thespeech of high sound quality.

BACKGROUND ART

[0002] First, an instance of a conventional portable telephone set isexplained with reference to FIGS. 1 and 2.

[0003] This portable telephone set is adapted for performingtransmission processing of coding the speech into a preset code inaccordance with the CELP system and transmitting the resulting code, andfor performing the receipt processing of receiving the code transmittedfrom other portable telephone sets and decoding the received code intospeech. FIGS. 1 and 2 show a transmitter for performing transmissionprocessing and a receiver for performing receipt processing,respectively.

[0004] In the transmitter, shown in FIG. 1, the speech uttered by a useris input to a microphone 1 where the speech is transformed into speechsignals as electrical signals, which are routed to an A/D(analog/digital) converter 2. The A/D converter 2 samples the analogspeech signals from the microphone 1 with, for example, the samplingfrequency of 8 kHz, for A/D conversion to digital speech signals, andfurther quantizes the resulting digital signals with a preset number ofbits to route the resulting quantized signals to an operating unit 3 andto an LPC (linear prediction coding) unit 4.

[0005] The LPC unit 4 performs LPC analysis of speech signals from theA/D converter 2, in terms of a frame corresponding to e.g., 160 samplesas a unit, to find p-dimensional linear prediction coefficients α₁, α₂,. . . , α_(P). The LPC analysis unit 4 sends a vector, having theseP-dimensional linear prediction coefficients α_(P), where P=1, 2, . . ., P, as components, to a vector quantizer 5, as a feature vector α ofthe speech.

[0006] The vector quantizer 5 holds a codebook, associating the codevector, having the linear prediction coefficients as components, withthe code, and quantizes the feature vector α from the LPC analysis unit4, based on this codebook, to send the code resulting from the vectorquantization, sometimes referred to below as A code (A_code), to a codedecision unit 15.

[0007] The vector quantizer 5 sends the linear prediction coefficientsα₁, α₂, . . . , α_(P)′, as components forming the code vector α′corresponding to the A code, to a speech synthesis filter 6.

[0008] The speech synthesis filter 6 is e.g., a digital filter of theIIR (infinite impulse response) type, and executes speech synthesis,with the linear prediction coefficients α_(P)′, where p=1, 2, . . . , P,from the vector quantizer 5 as tap coefficients of the IIR filter andwith the residual signals e from an operating unit 14 as an inputsignal.

[0009] That is, in the LPC analysis, executed by the LPC unit 4, it isassumed that a one-dimensional linear combination represented by theequation (1):

s _(n)+α₁ s _(n−1)+α₂ s _(n−2)+ . . . +α_(P) s _(n−p) =e _(n)  (1)

[0010] holds, where s_(n) is the (sampled value of) the speech signal atthe current time n and s_(n−1), s_(n−2), . . . , s_(n−p) are past Psample values neighboring thereto, and the linear predictioncoefficients α_(p), which will minimize the square error between theactual sample value s_(n) and a value of linear prediction s_(n)′thereof in case the predicted value (linear prediction value) s_(n)′ ofthe sampled value of the speech signal s_(n) at the current time islinear-predicted from the n past sample values s_(n−1), s_(n−2), . . . ,s_(n−p) in accordance with the following equation (2):

s _(n)′=−(α₁ s _(n−1)+α₂ s _(n−2)+ . . . +α_(p) s _(n−p))  (2)

[0011] is found.

[0012] In the above equation (1), {e_(n)} ( . . . , e_(n−1), e_(n),e_(n+1), . . . ) are reciprocally non-correlated probability variableswith an average value equal to 0 and with a variance equal to a presetvalue of β².

[0013] From the equation (1), the sample value s_(n) may be representedby the following equation (3):

s _(n) =e _(n)−(α₁ s _(n−1)+α₂ s _(n−2)+ . . . +α_(p) s _(n−p))  (3).

[0014] This may be Z-transformed to give the following equation (4):

S=E/(1+α₁ z ⁻¹+α₂ z ⁻²+ . . . +α_(p) z ^(−p))  (4)

[0015] where S and E denote Z-transforms of s_(n) and e_(n) in theequation (3), respectively.

[0016] From the equations (1) and (2), e_(n) can be represented by thefollowing equation (5):

e _(n) =s _(n) −s _(n)′  (5)

[0017] and is termed a residual signal between the real sample values_(n) and linear predicted value s_(n)′ thereof.

[0018] Thus, the speech signal s_(n) may be found from the equation (4),using the linear prediction coefficients α_(p) as tap coefficients ofthe IIR filter and also using the residual signal e_(n) as an inputsignal to the IIR filter.

[0019] The speech synthesis filter 6 calculates the equation (4), usingthe linear prediction coefficients α_(p)′ from the vector quantizer 5 astap coefficients and also using the residual signal e from the operatingunit 14 as an input signal, as described above, to find speech signals(synthesized speech signals) ss.

[0020] Meanwhile, since the speech synthesis filter 6 uses not thelinear prediction coefficients α_(p), obtained as the result of the LPCby the LPC unit 4, but the linear prediction coefficients α_(p)′ as acode vector corresponding to the code obtained by its vectorquantization. So, the synthesized speech signal output by the speechsynthesis filter 6 is not the same as the speech signal output by theA/D converter 2.

[0021] The synthesized sound signal ss, output by the speech synthesisfilter 6, is sent to the operating unit 3, which subtracts the speechsignal s, output from the A/D converter 2, from the synthesized speechsignal ss from the speech synthesis filter 6, to send the resultingdifference value to a square error operating unit 7. The square erroroperating unit 7 finds the square sum of the difference values from theoperating unit 3 (square sum of the sample values of the k'th frame) tosend the resulting square sum to a minimum square sum decision unit 8.

[0022] The minimum square sum decision unit 8 holds an L-code (L_code)as a code representing the lag, a G-code (G_code) as a code representingthe gain and an I-code (I_code) as the code representing the codeword,in association with the square error output by the square erroroperating unit 7, and outputs the I-code, G-code and the L-codecorresponding to the square error output from the square error operatingunit 7. The L-code, G-code and the I-code are sent to an adaptivecodebook storage unit 9, a gain decoder 10 and to an excitation codebookstorage unit 11, respectively. The L-code, G-code and the I-code arealso sent to a code decision unit 15.

[0023] The adaptive codebook storage unit 9 holds an adaptive codebook,which associates e.g., a 7-bit L-code with a preset delay time (lag),and delays the residual signal e supplied from the operating unit 14 bya delay time associated with the L-code supplied from the minimum squareerror decision unit 8 to output the resulting delayed signal to anoperating unit 12.

[0024] Since the adaptive codebook storage unit 9 outputs the residualsignal e with a delay corresponding to the L-code, the output signal maybe said to be a signal close to a periodic signal having the delay timeas a period. This signal mainly becomes a driving signal for generatinga synthesized sound of the voiced sound in the speech synthesisemploying linear prediction coefficients.

[0025] The gain decoder 10 holds a table which associates the G-codewith the preset gains β and γ, and outputs gain values β and γassociated with the G-code supplied from the minimum square errordecision unit 8. The gain values β and γ are supplied to the operatingunits 12 and 13.

[0026] An excitation codebook storage unit 11 holds an excitationcodebook, which associates e.g., a 9-bit I-code with a preset excitationsignal, and outputs the excitation signal, associated with the I-codeoutput from the minimum square error decision unit 8, to the operatingunit 13.

[0027] The excitation signal stored in the excitation codebook is asignal close e.g., to the white noise and becomes a driving signalmainly used for generating the synthesized sound of the unvoiced soundin the speech synthesis employing linear prediction coefficients.

[0028] The operating unit 12 multiplies an output signal of the adaptivecodebook storage unit 9 with the gain value β output by the gain decoder10 and routes a product value 1 to the operating unit 14. The operatingunit 13 multiplies the output signal of the excitation codebook storageunit 11 with the gain value γ output by the gain decoder 10 to send theresulting product n to the operating unit 14. The operating unit 14 sumsthe product value 1 from the operating unit 12 with the product value nfrom the operating unit 13 to send the resulting sum as the residualsignal e to the speech synthesis filter 6.

[0029] In the speech synthesis filter 6, the input signal, which is theresidual signal e, supplied from the operating unit 14, is filtered bythe IIR filter, having the linear prediction coefficients α_(p)′supplied from the vector quantizer 5 as tap coefficients, and theresulting synthesized signal is sent to the operating unit 3. In theoperating unit 3 and the square error operating unit 7, operationssimilar to those described above are carried out and the resultingsquare errors are sent to the minimum square error decision unit 8.

[0030] The minimum square error decision unit 8 verifies whether or notthe square error from the square error operating unit 7 has becomessmallest (locally minimum). If it is verified that the square error isnot locally minimum, the minimum square error decision unit 8 outputsthe L code, G code and the I code, corresponding to the square error,and subsequently repeats a similar sequence of operations.

[0031] If it is found that the square error has become smallest, theminimum square error decision unit 8 outputs a definite signal to thecode decision unit 15. The code decision unit 15 is adapted for latchingthe A code, supplied from the vector quantizer 5, and for sequentiallylatching the L code, G code and the I code, sent from the minimum squareerror decision unit 8. On receipt of the definite signal from theminimum square error decision unit 8, the code decision unit 15 sendsthe A code, L code, G code and the I code, then latched, to a channelencoder 16. The channel encoder 16 then multiplexes the A code, L code,G code and the I code, sent from the code decision unit 15, to outputthe resulting multiplexed data as code data, which code data istransmitted over a transmission channel.

[0032] For simplicity in explanation, the A code, L code, G code and theI code are assumed to be found from frame to frame. It is howeverpossible to divide e.g., one frame into four sub-frames and to find theL code, G code and the I code on the sub-frame basis.

[0033] It should be noted that, in FIG. 1, as in FIGS. 2, 11 and 12,explained later on, an array variable [k] is formed by affixing [k] toeach variable. In the present specification, explanation on this k,representing the number of frames, is sometimes omitted.

[0034] The code data, sent from a transmitter of another portabletelephone set, is received by a channel decoder 21 of a receiver shownin FIG. 2. The channel decoder 21 decodes the L code, G code, I code andthe A code from the cod data to send the so separated respective codesto an adaptive codebook storage unit 22, a gain decoder 23, anexcitation codebook storage unit 24 and to a filter coefficient decoder25.

[0035] The adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and the operating units 26 to 28 areconfigured similarly to the adaptive codebook storage unit 9, gaindecoder 10, excitation codebook storage unit 11 and the operating units12 to 14, respectively, and perform the processing similar to thatexplained with reference to FIG. 1 to decode the L code, G code and theI code into the residual signal e. This residual signal e is sent as aninput signal to a speech synthesis filter 29.

[0036] A filter coefficient decoder 25 holds the same codebook as thatstored in the vector quantizer 5 of FIG. 1 and decodes the A code to thelinear prediction coefficient αp′ which is then routed to the speechsynthesis filter 29.

[0037] The speech synthesis filter 29 is configured similarly to thespeech synthesis filter 6 of FIG. 1, and solves the equation (4), withthe linear prediction coefficient α_(p)′ from the filter coefficientdecoder 25 as a tap coefficient and with the residual signal e from theoperating unit 28 as an input signal, to generate a synthesized speechsignal when the square error has been found to be minimum by the minimumsquare error decision unit 8 of FIG. 1. This synthesized speech signalis sent to a D/A (digital/analog) converter 30. The D/A converter 30 D/Aconverts the synthesized speech signal from the speech synthesis filter29 to send the resulting analog signal to a loudspeaker 31 as output.

[0038] The transmitter of the portable telephone set transmits anencoded version of the residual signal and the linear predictioncoefficients, as filter data supplied to the speech synthesis filter 29of the receiver, as described above. Thus, the receiver decodes thecodes into the residual signal and the linear prediction coefficients.The so decoded residual signal and linear prediction coefficients arecorrupted with errors, such as quantization errors. Thus, the so decodedresidual signals and so decoded linear prediction coefficients,sometimes referred to below as decoded residual signals and decodedlinear prediction coefficients, respectively, are not the same as theresidual signal and linear prediction coefficients obtained on LPCanalysis of the speech, so that the synthesized speech signals, outputby the receiver's speech synthesis filter 29, are distorted andtherefore are deteriorated in sound quality.

DISCLOSURE OF THE INVENTION

[0039] In view of the above-described status of the art, it is an objectof the present invention to provide a method and an apparatus forprocessing data, a method and an apparatus for learning and a recordingmedium, whereby th synthesized sound of high sound quality may beachieved.

[0040] For accomplishing the above object, the present inventionprovides a speech processing device including a class tap extractionunit for extracting class taps, used for classifying the target speechto one of a plurality of classes, from the code, a classification unitfor finding the class of the target speech based on the class taps, anacquisition unit for acquiring the tap coefficients associated with theclass of the target speech from among the tap coefficients as found onlearning from class to class, and a prediction unit for finding theprediction values of the target speech using the prediction taps and thetap coefficients associated with the class of the target speech. Withthe speech of high sound quality, the prediction values of which are tobe found, as the target speech, the prediction taps used for predictingthe target speech are extracted from the synthesized sound. The classtaps, used for sorting the target speech into one of plural classes, areextracted from the code, and the tap coefficients, associated with theclass of the target speech, are acquried from the tap class-basedcoefficients as found on learning. The prediction values of the targetspeech are found using the prediction taps and the tap coefficientsassociated with the class of the target speech.

[0041] The learning device according to the present invention includes aclass tap extraction unit for extracting class taps from the code, theclass taps being used for classifying the speech of high sound quality,as target speech, the prediction values of which are to be found, aclassification unit for finding a class of the target speech based onthe class taps, and a learning unit for carrying out learning so thatthe prediction errors of the prediction values of the speech of highsound quality obtained on carrying out predictive calculations using thetap coefficients and the synthesized sound will be statisticallyminimum, to find the tap coefficients from class to class. With thespeech of high sound quality, the prediction values of which are to befound, as the target speech, the class taps used for sorting the targetspeech to one of plural classes are extracted from the code, and theclass of the target speech is found based on the class taps, by way ofclassification. The learning then is carried out so that the predictionerrors of the prediction values of the speech of high sound quality, asobtained in carrying out predictive calculations using the tapcoefficients and the synthesized sound, will be statistically smallestto find the class-based tap coefficients.

[0042] The data processing device according to the present inventionincludes a code decoding unit for decoding the code to output decodedfilter data, an acquisition unit for acquiring preset tap coefficientsas found by carrying out learning, and a prediction unit for carryingout preset predictive calculations, using the tap coefficients and thedecoded filter data, to find prediction values of the filter data, tosend the so found prediction values to the speech synthesis filter. Thecode is decoded, and the decoded filter data is output. The preset tapcoefficients, as found on effecting the learning, are acquired, andpreset predictive calculations are carried out using the tapcoefficients and the decoded filter data to find predicted values of thefilter data, which then is output to the speech synthesis filter.

[0043] The learning device according to the present invention includes acode decoding unit for decoding the code corresponding to filter data tooutput decoded filter data, and a learning unit for carrying outlearning so that the prediction errors of prediction values of thefilter data obtained on carrying out predictive calculations using thetap coefficients and decoded filter data will be statistically smallestto find the tap coefficients. The code associated with the filter datais decoded and the decoded filter data is output in a code decodingstep. Then, learning is carried out so that prediction errors of theprediction values of the filter data obtained on carrying out predictivecalculations using the tap coefficients and the decoded filter data willbe statistically minimum.

[0044] The speech processing device according to the present inventionincludes a prediction tap extraction unit for extracting prediction tapsusable for predicting the speech of high sound quality, as targetspeech, the prediction values of which are to be found, a class tapextraction unit for extracting class taps, usable for sorting the targetspeech to one of a plurality of classes, by way of classification, fromthe synthesized sound, the code or the information derived from thecode, an acquisition unit for acquiring the tap coefficients associatedwith the class of the target speech from the tap coefficients as foundon learning from one class to another, and a prediction unit for findingthe prediction values of the target speech using the prediction taps andthe tap coefficients associated with the class of the target speech.With the speech of high sound quality, the prediction values of whichare to be found, as the target speech, the prediction taps, used forpredicting the target speech, are extracted from the synthesized soundand the code or the information derived from the code, and the classtaps, used for sorting the target speech to one of plural classes, areextracted from the synthesized sound, code or the information derivedfrom the code. Based on the class taps, classification is carried outfor finding the class of the target speech. From the class-based tapcoefficients, as found on learning, the tap coefficient associated withthe class of the target speech are acquired. The prediction values ofthe target speech are found using the prediction taps and the tapcoefficients associated with the class of the target speech.

[0045] The learning device according to the present invention includes aprediction tap extraction unit for extracting prediction taps usable inpredicting the speech of high sound quality, as target speech, theprediction values of which are to be found, from the synthesized sound,the code or from the information derived from the code, a class tapextraction unit for extracting class taps usable for sorting the targetspeech to one of a plurality of classes, by way of classification, fromthe synthesized sound, the code or from the information derived from thecode, a classification unit for finding the class of the target speechbased on the class taps, and a learning unit for carrying out learningso that the prediction errors of prediction values of the speech of highsound quality, obtained on carrying out predictive calculations usingthe tap coefficients and the prediction taps, will be statisticallysmallest. With the speech of the high sound quality, the predictionvalues of which are to be found, as the target speech, the predictiontaps, used for predicting the target speech, are extracted from thesynthesized sound and the code or from the information derived from thecode. The class of the target speech is found, based on the class taps,by way of classification. Then, learning is carried out so that theprediction errors of the prediction values of the target speech acquiredon carrying out the predictive calculations using the tap coefficientsand the prediction taps will be statistically smallest to find the tapcoefficients on the class basis.

[0046] Other objects, features and advantages of the present inventionwill become more apparent from reading the embodiments of the presentinvention as shown in the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0047]FIG. 1 is a block diagram showing a typical transmitter forming aconventional portable telephone receiver.

[0048]FIG. 2 is a block diagram showing a typical receiver.

[0049]FIG. 3 is a block diagram showing a speech synthesis deviceembodying the present invention.

[0050]FIG. 4 is a block diagram showing a speech synthesis filterforming the speech synthesis device.

[0051]FIG. 5 is a flowchart for illustrating the processing of a speechsynthesis device shown in FIG. 3.

[0052]FIG. 6 is a block diagram showing a learning device embodying thepresent invention

[0053]FIG. 7 is a block diagram showing a prediction filter forming thelearning device according to the present invention.

[0054]FIG. 8 is a flowchart for illustrating the processing by thelearning device of FIG. 6.

[0055]FIG. 9 is a block diagram showing a transmission system embodyingthe present invention.

[0056]FIG. 10 is a block diagram showing a portable telephone setembodying the present invention.

[0057]FIG. 11 is a block diagram showing a receiver forming the portabletelephone set.

[0058]FIG. 12 is a block diagram showing a modification of the learningdevice embodying the present invention.

[0059]FIG. 13 is a block diagram showing a typical structure of acomputer embodying the present invention.

[0060]FIG. 14 is a block diagram showing another typical structure of aspeech synthesis device embodying the present invention.

[0061]FIG. 15 is a block diagram showing a speech synthesis filterforming the speech synthesis device.

[0062]FIG. 16 is a flowchart for illustrating the processing of thespeech synthesis device shown in FIG. 14.

[0063]FIG. 17 is a block diagram showing another modification of thelearning device embodying the present invention.

[0064]FIG. 18 is a block diagram showing a prediction filter forming thelearning device according to the present invention.

[0065]FIG. 19 is a flowchart for illustrating the processing of thelearning device shown in FIG. 17.

[0066]FIG. 20 is a block diagram showing a transmission system embodyingthe present invention.

[0067]FIG. 21 is a block diagram for illustrating the portable telephoneset embodying the present invention.

[0068]FIG. 22 is a block diagram showing the receiver forming theportable telephone set.

[0069]FIG. 23 is a block diagram showing still another modification ofthe learning device embodying the present invention.

[0070]FIG. 24 is a block diagram showing still another typical structureof a speech synthesis device embodying the present invention.

[0071]FIG. 25 is a block diagram showing a speech synthesis filterforming the speech synthesis device.

[0072]FIG. 26 is a flowchart for illustrating the processing of thespeech synthesis device shown in FIG. 24.

[0073]FIG. 27 is a block diagram showing a further modification of thelearning device embodying the present invention.

[0074]FIG. 28 is a block diagram showing a prediction filter forming thelearning device according to the present invention.

[0075]FIG. 29 is a flowchart for illustrating the processing of thelearning device shown in FIG. 27.

[0076]FIG. 30 is a block diagram showing a transmission system embodyingthe present invention.

[0077]FIG. 31 is a block diagram showing a portable telephone setembodying the present invention.

[0078]FIG. 32 is a block diagram showing a receiver forming the portabletelephone set.

[0079]FIG. 33 is a block diagram showing a further modification of thelearning device embodying the present invention.

[0080]FIG. 34 shows teacher and pupil data.

BEST MODE FOR CARRYING OUT THE INVENTION

[0081] Referring to the drawings, certain preferred embodiments of thepresent invention will be explained in detail.

[0082] The speech synthesis device, embodying the present invention, isconfigured as shown in FIG. 3, and is fed with code data obtained onmultiplexing the residual code and the A code obtained in turnrespectively on coding residual signals and linear predictioncoefficients, to be supplied to a speech synthesis filter 44, by vectorquantization. From the residual code and the A code, the residualsignals and linear prediction coefficients are decoded, respectively,and fed to the speech synthesis filter 44, to generate the synthesizedsound. The speech synthesis device executes predictive calculations,using the synthesized sound produced by the speech synthesis filter 44and also using tap coefficients as found on learning, to find the highquality synthesized speech, that is the synthesized sound with improvedsound quality.

[0083] With the speech synthesis device of the present invention, shownin FIG. 3, classification adaptive processing is used to decode thesynthesized speech to high quality true speech, more precisely predictedvalues thereof.

[0084] The classification adaptive processing is comprised ofclassification and adaptive and processing. By the classification, thedata is classified depending on its characteristics and subjected toclass-based adaptive processing. The adaptive processing uses thefollowing technique:

[0085] That is, the adaptive processing finds predicted values of thetrue speech of high sound quality by, for example, the linearcombination of the synthesized speech and preset tap coefficients.

[0086] Specifically, it is now contemplated to find predicted valuesE[y] of the high quality speech as teacher data, using, as teacher data,the speech of the true speech of high quality, more precisely thesamples values thereof, and also using, as pupil data, the synthesizedspeech obtained on coding the true speech of high quality into the Lcode, G code, I code and the A code, in accordance with the CELP system,and subsequently on decoding these codes by the receiver shown in FIG.2, by a model of one-dimensional linear combination defined by a set ofsynthesized sounds, more precisely sample values thereof, that is x₁,x₂, . . . , and a linear combination of preset tap coefficients w₁, w₂,. . . . It is noted that the prediction value E[y] may be represented bythe following equation:

E[y]=w ₁ x ₁ +w ₂ x ₂+ . . .   (6).

[0087] If, for generalizing the equation (6), a matrix W formed by a setof tap coefficients w_(j), a matrix X formed by a set of pupil datax_(ij) and a matrix Y′ formed by a set of prediction values E[y_(i)] aredefined as: $X = \begin{bmatrix}x_{11} & x_{12} & \cdots & x_{i\quad J} \\x_{21} & x_{22} & \cdots & x_{2J} \\\cdots & \cdots & \cdots & \cdots \\x_{J1} & x_{J2} & \cdots & x_{J\quad J}\end{bmatrix}$ $W = {{\begin{bmatrix}w_{1} \\w_{2} \\\cdots \\w_{J}\end{bmatrix}{{}_{}^{}{}_{}^{}}} = {\begin{bmatrix}{E\left\lbrack y_{1} \right\rbrack} \\{E\left\lbrack y_{2} \right\rbrack} \\\cdots \\{E\left\lbrack y_{I} \right\rbrack}\end{bmatrix}\,}}$

[0088] the following observation equation:

XW=Y′  (7)

[0089] holds.

[0090] It is noted that the component x_(ij) of the matrix X denotes thecolumn number j of pupil data in the set of the number i row of pupildata (set of pupil data used in predicting teacher data y_(i) of thenumber i row of teacher data) and that the component w_(j) of the matrixW denotes the tap coefficient a product of which with the number jcolumn of pupil data in the set of pupil data is to be found. It is alsonoted that y_(i) denotes the number i row of teacher data and henceE[y_(i)] denotes the predicted value of the number i row of teacherdata. It is also noted that a suffix i of the component y_(i) of thematrix Y is omitted from y on the left side of the equation (6) and thata suffix i is similarly omitted from the component x_(ij) of the matrixX.

[0091] It is now contemplated to apply the least square method to thisobservation equation to find a predicted value E[y] close to the truesound y of high quality. If the matrix Y formed by a set of speech y ofhigh sound quality as teacher data and the matrix E formed by a set ofresidual signals e of the prediction values E[y] for the speech y ofhigh sound quality are defined by: ${E = {\begin{bmatrix}e_{1} \\e_{2} \\\cdots \\e_{T}\end{bmatrix}\,}},{Y = {\begin{bmatrix}y_{1} \\y_{2} \\\cdots \\y_{T}\end{bmatrix}\,}}$

[0092] the following residual equation:

XW=Y+E  (8)

[0093] holds from the equation (7).

[0094] In this case, the tap coefficients w_(j) for finding theprediction value E[y] close to the true speech of high sound quality ymay be found by minimizing the square error$\sum\limits_{i - 1}^{I}e_{i}^{2}$

[0095] The tap coefficients for the case when the above square error,differentiated with the tap coefficient w_(j), is equal to zero, that isthe tap coefficient w_(j) satisfying the following equation:${{e_{1}\frac{\partial e_{1}}{\partial w_{j}}} + {e_{2}\frac{\partial e_{2}}{\partial w_{j}}} + \ldots + {e_{I}\frac{\partial e_{I}}{\partial w_{J}}}} = {0\left( {{j = 1},2,{\ldots \quad J}} \right)}$

[0096] represents an optimum value for finding the predicted value E[y]close to the true speech y of high sound quality.

[0097] First, the equation (8) is differentiated with respect to the tapcoefficient w_(j) to obtain the following equation: $\begin{matrix}{{\frac{\partial e_{i}}{\partial w_{1}} = {x\quad i_{1}}},{\frac{\partial e_{2}}{\partial w_{2}} = x_{i2}},\ldots \quad,{\frac{\partial e_{i}}{\partial w_{J}} = {{x_{i\quad J}\left( {{i = 1},{2\quad \ldots \quad I}} \right)}.}}} & (10)\end{matrix}$

[0098] From the equations (9) and (10), the following equation (11):$\begin{matrix}{{{\sum\limits_{i = 1}^{I}{e_{i}x_{i1}}} = 0},{{\sum\limits_{i = 1}^{I}{e_{i}x_{i2}}} = 0},\ldots,{{\sum\limits_{i = 1}^{I\quad n}{e_{i}x_{i\quad J}}} = 0}} & (11)\end{matrix}$

[0099] is obtained.

[0100] Taking into account the relationships among pupil data x_(ij),tap coefficients w_(j), teacher data y_(i) and errors e_(i), in theresidual equation (8), the following normal equations: $\begin{matrix}\left\{ \begin{matrix}{{{\left( {\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i1}}} \right)W_{1}} + {\left( {\sum\limits_{i = 1}^{I}{X_{i\quad 1}X_{i2}}} \right)W_{2}} + \ldots + {\left( {\sum\limits_{i = 1}^{I}{X_{i\quad 1}X_{i\quad J}}} \right)W_{J}}} = \left( {\sum\limits_{i = 1}^{I}{X_{i\quad 1}Y_{i}}} \right)} \\{{{\left( {\sum\limits_{i = 1}^{I}{X_{i2}X_{i1}}} \right)W_{1}} + {\left( {\sum\limits_{i = 1}^{I}{X_{i\quad 2}X_{i2}}} \right)W_{2}} + \ldots + {\left( {\sum\limits_{i = 1}^{I}{X_{i2}X_{i\quad J}}} \right)W_{J}}} = \left( {\sum\limits_{i = 1}^{I}{X_{i\quad 2}Y_{i}}} \right)} \\\cdots \\{{{\left( {\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i1}}} \right)W_{1}} + {\left( {\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i2}}} \right)W_{2}} + \ldots + {\left( {\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i\quad J}}} \right)W_{J}}} = \left( {\sum\limits_{i = 1}^{I}{X_{i\quad J}Y_{i}}} \right)}\end{matrix} \right. & (12)\end{matrix}$

[0101] is obtained.

[0102] If the matrix (co-variance matrix) A and the vector v are definedby: $A = \begin{bmatrix}{\sum\limits_{i = 1}^{I}{X_{i\quad 1}X_{i1}}} & {\sum\limits_{i = 1}^{I}{X_{i\quad 1}X_{i2}}} & \cdots & {\sum\limits_{i = 1}^{I}{X_{i\quad 1}X_{i\quad J}}} \\{\sum\limits_{i = 1}^{I}{X_{i2}X_{i1}}} & {\sum\limits_{i = 1}^{I}{X_{i\quad 2}X_{i2}}} & \cdots & {\sum\limits_{i = 1}^{I}{X_{i2}X_{i\quad J}}} \\\quad & \quad & \quad & \cdots \\{\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i1}}} & {\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i2}}} & \cdots & {\sum\limits_{i = 1}^{I}{X_{i\quad J}X_{i\quad J}}}\end{bmatrix}$ $v = \begin{pmatrix}{\sum\limits_{i = 1}^{I}{X_{i\quad 1}Y_{i}}} \\{\sum\limits_{i = 1}^{I}{X_{i2}Y_{i}}} \\{\quad \vdots} \\{\sum\limits_{i = 1}^{I}{X_{i\quad J}Y_{i}}}\end{pmatrix}$

[0103] and the vector W is defined as shown in the equation 1, thenormal equation shown by the equation (12) may be expressed as:

AW=v  (13).

[0104] A number the normal equations equal to the number J of the tapcoefficients w_(j) to be found may be established as the normalequations of (12) by providing a certain number of sets of the pupildata x_(ij) and teacher data y_(i). Consequently, optimum tapcoefficients, herein the tap coefficients that minimize the squareerror, may be found by solving the equation (13) with respect to thevector W. However, it is noted that, for solving the equation (13), thematrix A in the equation (13) needs to be regular, and that e.g., asweep-out method (Gauss-Jordan's erasure method) may be used in theprocess for the solution.

[0105] It is the adaptive processing that finds the optimum tapcoefficients w_(j) and uses the so found optimum tap coefficients w_(j)to find the prediction value E[y] close to the true speech of the highquality y using the equation (6).

[0106] If the speech signal sampled at a high sampling frequency, orspeech signals employing a larger number of allocated bits, are used asteacher data, while the synthesized sound, obtained on decoding anencoded version by the CELP system of speech signals, obtained in turnon decimation or re-quantization employing a smaller number of bits ofspeech signals as the teacher data, is used as pupil data, such tapcoefficients are used which will give the speech of high sound qualitywhich statistically minimizes the prediction error in generating thespeech signals sampled at a high sampling frequency, or speech signalsemploying a larger number of allocated bits. In this case, thesynthesized speech of high sound quality may be produced.

[0107] In the speech synthesis device, shown in FIG. 3, code data,comprised of the A code and the residual code, may be decoded to thehigh sound quality speech by the above-described classification adaptiveprocessing.

[0108] That is, a demultiplexer (DEMUX) 41, supplied with code data,separates frame-based A code and the residual code from code datasupplied thereto. The demultiplexer 41 routes the A code to a filtercoefficient decoder 42 and to a tap generator 46, while supplying theresidual code to a residual codebook storage unit 43 and to a tapgenerator 46.

[0109] It is noted that the A code and the residual code, contained inthe code data in FIG. 3, are the codes obtained on vector quantization,with a preset codebook, of the linear prediction coefficients and theresidual signals obtained on LPC speech analysis.

[0110] The filter coefficient decoder 42 decodes the frame-based A code,supplied thereto from the demultiplexer 41, into linear predictioncoefficients, based on the same codebook as that used in obtaining the Acode, to supply the so decoded signals to a speech synthesis filter 44.

[0111] The residual codebook storage unit 43 decodes the frame-basedresidual code, supplied from the demultiplexer 41, into residualsignals, based on the same codebook as that used in obtaining theresidual code, to send the so decoded signals to a speech synthesisfilter 44.

[0112] Similarly to, for example, the speech synthesis filter 29 shownin FIG. 1, the speech synthesis filter 44 is an IIR type digital filter,and proceeds to filtering the residual signals from the residualcodebook storage unit 43, as input signals, using the linear predictioncoefficients from the filter coefficient decoder 42 as tap coefficientsof the IIR filter, to generate the synthesized sound, which then isrouted to a tap generator 45.

[0113] From sampled values of the synthesized speech, supplied from thespeech synthesis filter 44, the tap generator 45 extracts what is to beprediction taps used in prediction calculations in a prediction unit 49which will be explained subsequently. That is, the tap generator 45uses, as prediction taps, the totality of sampled values of thesynthesized sound of a frame of interest, that is a frame the predictionvalues of the high quality speech of which are being found. The tapgenerator 45 routes the prediction taps to a prediction unit 49.

[0114] The tap generator 46 extracts what are to become class taps fromthe frame- or subframe-based A code and residual code, supplied from thedemultiplexer 41. That is, the tap generator 46 renders the totality ofthe A code and the residual code the class taps, and routes the classtaps to a classification unit 47.

[0115] The pattern for constituting the prediction tap or class tap isnot limited to the aforementioned pattern.

[0116] Meanwhile, the tap generator 46 is able to extract the class tapsnot only from the A and residual codes, but also from the linearprediction coefficients, output by the filter coefficient decoder 42,residual signals output by the residual codebook storage unit 43 andfrom the synthesized sound output by the speech synthesis filter 44.

[0117] Based on the class taps from the tap generator 46, theclassification unit 47 classifies the speech, more precisely sampledvalues of the speech, of the frame of interest, and outputs theresulting class code corresponding to the so obtained class to acoefficient memory 48.

[0118] It is possible for the classification unit 47 to output a bitstring itself forming the A code and the residual code of the frame ofinterest as the class tap.

[0119] The coefficient memory 48 holds class-based tap coefficients,obtained on carrying out the learning in the learning device of FIG. 6,which will be explained subsequently. The coefficient memory 48 outputsthe tap coefficients stored in an address associated with the class codeoutput by the classification unit 47 to the prediction unit 49.

[0120] If N samples of high sound quality are found for each frame, Nsets of tap coefficients are required in order to find N speech samplesfor the frame of interest by the predictive calculations of the equation(6). Thus, in the present case, N sets of tap coefficients are stored inthe coefficient memory 48 for the address associated with one classcode.

[0121] The prediction unit 49 acquires the prediction taps output by thetap generator 45 and the tap coefficients output by the coefficientmemory 48 and, using the prediction taps and tap coefficients, performslinear predictive calculations (sum of product calculations) shown inthe equation (6) to find predicted values of the high sound qualityspeech of the frame of interest to output the resulting values to a D/Aconverter 50.

[0122] The coefficient memory 48 outputs N sets of tap coefficients forfinding N samples of the speech of the frame of interest, as describedabove. Using the prediction taps of the respective samples and the setof tap coefficients corresponding to the sampled values, the predictionunit 49 carries out the sum-of-product processing of the equation (6).

[0123] The D/A converter 50 D/A converts the speech, more preciselypredicted values of the speech, from the prediction unit 49, fromdigital signals into corresponding analog signals, to send the resultingsignals to the loudspeaker 51 as output.

[0124]FIG. 4 shows an illustrative structure of the speech synthesisfilter 44 shown in FIG. 3.

[0125] In FIG. 4, the speech synthesis filter 44 uses p-dimensionallinear prediction coefficients and is made up of a sole adder 61, Pdelay circuits (D) 62 ₁ to 62 _(p) and P multipliers 63 ₁ to 63 _(p).

[0126] In the multipliers 63 ₁ to 63 _(p) are set P-dimensional linearprediction coefficients α₁, α₂, . . . , α_(p), sent from the filtercoefficient decoder 42, respectively, whereby the speech synthesisfilter 44 carries out the calculations in accordance with the equation(4) to generate the synthesized sound.

[0127] That is, the residual signals e, output by the residual codebookstorage unit 43, are sent via adder 61 to the delay circuit 62 _(p),which delay circuit 62 _(p) delays the input signal thereto by onesample of the residual signals to output the delayed signal to adownstream side delay circuit 62 _(p+1) and to the multiplier 63 _(p).This multiplier 63 _(p) multiplies the output of the delay circuit 62_(p) with the linear prediction coefficients α_(p) stored therein tooutput the resulting product to the adder 61.

[0128] The adder 61 adds all outputs of the multipliers 63 ₁ to 63 _(p)and the residual signals e and sums the result of the addition to thedelay circuit 62 ₁ while outputting it as being the result of speechsynthesis (synthesized sound).

[0129] Referring to the flowchart of FIG. 5, the speech synthesis of thespeech synthesis device of FIG. 3 is now explained.

[0130] The demultiplexer 41 sequentially separates frame-based A codeand residual code to send the separated codes to the filter coefficientdecoder 42 and to the residual codebook storage unit 43. Thedemultiplexer 41 sends the A code and the residual code to the tapgenerator 46.

[0131] The filter coefficient decoder 42 sequentially decodes theframe-based A code, supplied thereto from the demultiplexer 41, to sendthe resulting decoded coefficients to the speech synthesis filter 44.The residual codebook storage unit 43 sequentially decodes theframe-based residual codes, supplied from the demultiplexer 41, intoresidual signals, which are then sent to the speech synthesis filter 44.

[0132] Using the residual signal and the linear prediction coefficients,supplied thereto, the speech synthesis filter 44 carries out theprocessing in accordance with the equation (4) to generate thesynthesized speech of the frame of interest. This synthesized sound issent to the tap generator 45.

[0133] The tap generator 45 sequentially renders the frame of thesynthesized sound, sent thereto, a frame of interest and, at step S1,generates prediction taps from sample values of the synthesized soundsupplied from the speech synthesis filter 44, to output the so generatedprediction taps to the prediction unit 49. At step S1, the tap generator46 generates the class taps from the A code and the class taps from theA code and the residual code supplied from the demultiplexer 41 tooutput the so generated class taps to the classification unit 47.

[0134] At step S2, the classification unit 47 carries out theclassification, based on the class taps, supplied from the tap generator46, to send the resulting class codes to the coefficient memory 48. Theprogram the moves to step S3.

[0135] At step S3, the coefficient memory 48 reads out the tapcoefficients, supplied from the address corresponding to the class codessupplied from the classification unit 47, to send the resulting tapcoefficients to the prediction unit 49.

[0136] The program then moves to step S4 where the prediction unit 49acquires tap coefficients output by the coefficient memory 48 and, usingthe tap coefficients and the prediction taps from the tap generator 45,carries out the sum-of-product processing shown in the equation (6) toproduce predicted values of the high sound quality speech of the frameof interest. The high sound quality speech is sent to and output fromthe loudspeaker 51 via prediction unit 49 and D/A converter 50.

[0137] If the speech of the high sound quality of the frame of interesthas been acquired at the prediction unit 49, the program moves to stepS5 where it is verified whether or not there is any frame to beprocessed as the frame of interest. If it is verified that there isstill a frame to be processed as the frame of interest, the programreverts to step S1 and repeats similar processing with the frame to bethe next frame of interest as a new frame of interest. If it is verifiedat step S5 that there is no frame to be processed as the frame ofinterest, the speech synthesis processing is terminated.

[0138] Referring to FIG. 6, an instance of a learning device foreffecting the learning processing of the tap coefficients to be storedin the coefficient memory 48 of FIG. 3 is now explained.

[0139] The learning device shown in FIG. 6 is supplied with digitalspeech signals for learning, from one preset frame to another. Thesedigital speech signals for learning are sent to an LPC analysis unit 71and to a prediction filter 74. The digital speech signals for learningare also supplied as teacher data to a normal equation addition circuit81.

[0140] The LPC analysis unit 71 sequentially renders the frame of thespeech signals, supplied thereto, a frame of interest, and LPC-analyzesthe speech signals of the frame of interest to find p-dimensional linearprediction coefficients which are then sent to the prediction filter 74and to a vector quantizer 72.

[0141] The vector quantizer 72 holds a codebook, associating the codevectors, having linear prediction coefficients as components, with thecodes Based on the codebook, the vector quantizer 72 vector-quantizesthe feature vectors, constituted by the linear prediction coefficientsof the frame of interest from the LPC analysis unit 71, and sends the Acode, obtained as a result of the vector quantization, to a filtercoefficient decoder 73 and to a tap generator 79.

[0142] The filter coefficient decoder 73 holds the same codebook as thatheld by the vector quantizer 72 and, based on the codebook, decodes theA code from the vector quantizer 72 into linear prediction coefficientswhich are routed to a speech synthesis filter 77. The filter coefficientdecoder 42 of FIG. 3 is constructed similarly to the filter coefficientdecoder 73 of FIG. 6.

[0143] The prediction filter 74 carries out the processing, inaccordance with the aforementioned equation (1), using the speechsignals of the frame of interest, supplied thereto, and the linearprediction coefficients from the LPC analysis unit 71, to find theresidual signals of the frame of interest, which then are sent to vectorquantizer 75.

[0144] If the Z-transforms of s_(n) and e_(n) in the equation (1) areexpressed as S and E, respectively, the equation (1) may be representedby the following equation:

E=(1+α₁ z ⁻¹+α₂ z ⁻²+ . . . +α_(p) z ^(−p))S.  (14)

[0145] The prediction filter 74 for finding the residual signal e fromthe equation (14) may be constructed as a digital filter of the FIR(finite impulse response) type.

[0146]FIG. 7 shows an illustrative structure of the prediction filter74.

[0147] The prediction filter 74 is fed with p-dimensional linearprediction coefficients from the LPC analysis unit 71, so that theprediction filter 74 is made up of p delay circuits D 91 ₁ to 91 _(p), pmultipliers 92 ₁ to 92 _(p) and one adder 93.

[0148] In the multipliers 92 ₁ to 92 _(p) are set p-dimensional linearprediction coefficients α₁, α₂, . . . , α_(p) supplied from the LPCanalysis unit 71.

[0149] On the other hand, the speech signals s of the frame of interestare sent to a delay circuit 91 ₁ and to an adder 93. The delay circuit91 _(p) delays the input signal thereto by one sample of the residualsignals to output the delayed signal to the downstream side delaycircuit 91 _(p+1) and to the operating unit 92 _(p). The multiplier 92_(p) multiplies the output of the delay circuit 91 _(p) with the linearprediction coefficients, stored therein, to send the resulting productvalue to the adder 93.

[0150] The adder 93 sums all of the outputs of the multipliers 92 ₁ to92 _(p) to the speech signals s to send the results of addition as theresidual signals e.

[0151] Returning to FIG. 6, the vector quantizer 75 holds a codebook,associating sample values of the residual signals as components, withthe codes Based on this codebook, residual vectors formed by the samplevalues of the residual signals of the frame of interest, from theprediction filter 74, are vector quantized, and the residual codes,obtained as a result of the vector quantization, are sent to a residualcodebook storage unit 76 and to the tap generator 79.

[0152] The residual codebook storage unit 76 holds the same codebook asthat held by the vector quantizer 75 and, based on the codebook, decodesthe residual code from the vector quantizer 75 into residual signalswhich are routed to the speech synthesis filter 77. The residualcodebook storage unit 43 of FIG. 3 is constructed similarly to theresidual codebook storage unit 76 of FIG. 6.

[0153] A speech synthesis filter 77 is an IIR filter constructedsimilarly to the speech synthesis filter 44 of FIG. 3, and filters theresidual signal from the residual signal storage unit 75 as an inputsignal, with the linear prediction coefficients from the filtercoefficient decoder 73 as tap coefficients of the IIR filter, togenerate the synthesized sound, which then is routed to a tap generator78.

[0154] Similarly to the tap generator 45 of FIG. 3, the tap generator 78forms prediction taps from the linear prediction coefficients, suppliedfrom the speech synthesis filter 77 to send the so formed predictiontaps to the normal equation addition circuit 81. Similarly to the tapgenerator 46 of FIG. 3, the tap generator 79 forms class taps from the Acode and the residual code, sent from the vector quantizers 72 to 75, tosend the class taps to a classification unit 80.

[0155] Similarly to the classification unit 47 of FIG. 3, theclassification unit 80 carries out the classification, based on theclass taps, supplied thereto, to send the resulting class codes to thenormal equation addition circuit 81.

[0156] The normal equation addition circuit 81 sums the speech forlearning, which is the high sound quality speech of the frame ofinterest, as teacher data, to an output of the synthesized sound fromthe speech synthesis filter 77 forming the prediction taps as pupil datafrom the tap generator 78.

[0157] Using the prediction taps (pupil data), supplied from theclassification unit 80, the normal equation addition circuit 81 carriesout the reciprocal multiplication of the pupil data, as components in amatrix A of the equation (13) (x_(in)x_(im)), and operations equivalentto summation (Σ).

[0158] Using the pupil data, that is sampled values of the synthesizedsound output from the speech synthesis filter 77, and teacher data, thatis sampled values of the high sound quality speech of the frame ofinterest, the normal equation addition circuit 81 carries out theprocessing equivalent to multiplication (x_(in)y_(i)), and summation (Σ)of the pupil data and the teacher data, as components in the vector v ofthe equation (13), for each class corresponding to the class codesupplied from the classification unit 80.

[0159] The normal equation addition circuit 81 carries out the abovesummation, using all of the speech frames for learning, suppliedthereto, to establish the normal equation, shown in FIG. 13, for eachclass.

[0160] A tap coefficient decision circuit 82 solves the normal equation,generated in the normal equation addition circuit 81, from class toclass, to find tap coefficients for the respective classes. The tapcoefficients, thus found, are sent to the address associated with eachclass of the memory 83.

[0161] Depending on the speech signals, provided as speech signals forlearning, there are occasions wherein, in a class or classes, a numberof the normal equations required to find tap coefficients cannot beproduced in the normal equation addition circuit 81. For such class(es),the tap coefficient decision circuit 82 outputs default tapcoefficients.

[0162] The coefficient memory 83 memorizes the class-based tapcoefficients, supplied from the tap coefficient decision circuit 82, inan address associated with the class.

[0163] Referring to the flowchart of FIG. 8, the learning processing bythe learning device of FIG. 6 is now explained.

[0164] The learning device is fed with speech signals for learning,which are sent to both the LPC analysis unit 71 and to the predictionfilter 74, while being sent as teacher data to the normal equationaddition circuit 81. At step S11, pupil data are generated from thespeech signals for learning.

[0165] That is, the LPC analysis unit 71 sequentially renders the framesof the speech signals for learning the frames of interest andLPC-analyzes the speech signals of the frames of interest to findp-dimensional linear prediction coefficients which are sent to thevector quantizer 72. The vector quantizer 72 vector-quantizes thefeature vectors formed by the linear prediction coefficients of theframe of interest, from the LPC analysis unit 71, and sends the A coderesulting from the vector quantization to the filter coefficient decoder73 and to the tap generator 79. The filter coefficient decoder 73decodes the A code from the vector quantizer 72 into linear predictioncoefficients which are sent to the speech synthesis filter 77.

[0166] On the other hand, the prediction filter 74, which has receivedthe linear prediction coefficients of the frame of interest from the LPCanalysis unit 71, carries out the processing of the equation (1), usingthe linear prediction coefficients and the speech signals for learningof the frame of interest, to find the residual signals of the frame ofinterest to send the so found residual signals to the vector quantizer75. The vector quantizer 75 vector-quantizes the residual vector formedby the sample values of the residual signals of the frame of interestfrom the prediction filter 74 to send the residual code obtained onvector quantization to the residual codebook storage unit 76 and to thetap generator 79. The residual codebook storage unit 76 decodes the Acode from the vector quantizer 75 into linear prediction coefficientswhich are then supplied to the speech synthesis filter 77.

[0167] On receipt of the linear prediction coefficients and the residualsignals, the speech synthesis filter 77 performs speech synthesis, usingthe linear prediction coefficients and the residual signals, to outputthe resulting synthesized signals as pupil data to the tap generator 78.

[0168] The program then moves to step S12 where the tap generator 78generates prediction taps from the synthesized sound supplied from thespeech synthesis filter 77, while the tap generator 79 generates classtaps from the code A from the vector quantizer 72 and from the residualcode from the vector quantizer 75. The prediction taps are sent to thenormal equation addition circuit 81, whilst the class taps are routed tothe classification unit 80.

[0169] At step S13, the classification unit 80 then performsclassification based on the class taps from the tap generator 79 toroute the resulting class code to the normal equation addition circuit81.

[0170] The program then moves to step S14 where the normal equationaddition circuit 81 carries out the aforementioned addition to thematrix A and the vector v of the equation (13), for the sample values ofthe speech of the high sound quality of the frame of interest as teacherdata supplied thereto, and the prediction taps, more precisely thesampled values of the synthesized sound making up the prediction taps,as pupil data from the tap generator 78 for the class supplied from theclassification unit 80. The program then moves to step S15.

[0171] At step S15, it is verified whether or not there are any speechsignals for learning to be processed as the frame of interest. If it isverified at step S15 that there are any speech signals for learning tobe processed as the frame of interest, the program reverts to step S11to repeat the similar processing, with the sequentially next frames asthe new frame of interest.

[0172] If it is found at step S15 that there is no speech signal forlearning of the frame to be processed as the frame of interest, that isif a normal equation has been obtained for each class in the normalequation addition circuit 81, the program moves to step S16 where thetap coefficient decision circuit 82 solves the normal equation generatedfrom class to class to find the tap coefficients for each class. The sofound tap coefficients are sent to the address associated with eachclass in a coefficient memory 83 for storage therein to terminate theprocessing.

[0173] The class-based tap coefficients, thus stored in the coefficientmemory 83, are stored in this manner in the coefficient memory 48 ofFIG. 3.

[0174] Thus, since the tap coefficients stored in the coefficient memory48 of FIG. 3 are found in this manner by carrying out the learning insuch a manner that the prediction error of the prediction values of thespeech of the high sound quality, that is the square error, will bestatistically minimum, the speech output by the prediction unit 49 ofFIG. 3 is of high sound quality in which the distortion of thesynthesized sound output by the speech synthesis filter 44 has beenreduced or eliminated.

[0175] Meanwhile, if, in the speech synthesis device of FIG. 3, theclass taps are to be extracted by e.g., the tap generator 46 from thelinear prediction coefficients or the residual signals, it is necessaryto have the tap generator 79 of FIG. 6 extract the similar class tapsfrom the linear prediction coefficients output by the filter coefficientdecoder 73 and from the residual signals output by the residual codebookstorage unit 76. However, if class taps are extracted even from e.g.,the linear prediction coefficients, the number of the taps is increased.So, the classification preferably is to be carried out by compressingthe class taps by, for example, the vector quantization. Meanwhile, ifthe classification is to be performed solely by the residual code andthe A code, the load needed in classification processing may be relievedbecause the array of bit strings of the residual code and the A code candirectly be used as the class code.

[0176] An instance of the transmission system embodying the presentinvention is explained with reference to FIG. 9. The system herein meansa set of logically arrayed plural devices, while it does not matterwhether or not the respective devices are in the same casing.

[0177] In the transmission system shown in FIG. 9, the portabletelephone sets 101 ₁, 101 ₂ perform radio transmission and receipt withbase stations 102 ₁, 102 ₂, respectively, while the base stations 102 ₁,102 ₂ perform transmission and receipt with an exchange station 103 toenable speech transmission and receipt of speech between the portabletelephone sets 101 ₁, 101 ₂ with the aid of the base stations 102 ₁, 102₂ and the exchange station 103. The base stations 102 ₁, 102 ₂ may bethe same as or different from each other.

[0178] The portable telephone sets 101 ₁, 101 ₂ are referred to below asa portable telephone set 101, unless there is specified necessity formaking distinction between the sets.

[0179]FIG. 10 shows an illustrative structure of the portable telephoneset 101 shown in FIG. 9.

[0180] An antenna 111 receives electrical waves from the base stations102 ₁, 102 ₂ to send the received signals to a modem 112 as well as tosend the signals from the modem 112 to the base stations 102 ₁, 102 ₂ aselectrical waves. The modem 112 demodulates the signals from the antenna111 to send the resulting code data explained with reference to FIG. 1to a receipt unit 114. The modem 112 also is configured for modulatingthe code data from the transmitter 113 as shown in FIG. 1 and sends theresulting modulated signal to the antenna 111. The transmitter 113 isconfigured similarly to the transmitter shown in FIG. 1 and codes theuser's speech input thereto into code data which is supplied to themodem 112. The receipt unit 114 receives the code data from the modem112 to decode and output the speech of high sound quality similar tothat obtained in the speech synthesis device of FIG. 3.

[0181] That is, FIG. 11 shows an illustrative structure of the receiptunit 114 of FIG. 10. In the drawing, parts or components correspondingto those shown in FIG. 2 are depicted by the same reference numerals andare not explained specifically.

[0182] A tap generator 121 is fed with the synthesized sound output by aspeech synthesis unit 29. From the synthesized sound, the tap generator121 extracts what are to be prediction taps (sampled values), which arethen routed to a prediction unit 125.

[0183] A tap generator 122 is fed with frame-based or subframe-based L,G and A codes, output by a channel decoder 21. The tap generator 122 isalso fed with residual signals from the operating unit 28, while alsobeing fed with linear prediction coefficients from a filter coefficientdecoder 25. The tap generator 122 generates what are to be class taps,from the L, G, I and A codes, residual signals and the linear predictioncoefficients, supplied thereto, to route the extracted class taps to aclassification unit 123.

[0184] The classification unit 123 carries out classification, based onthe class taps supplied from the tap generator 122, to route the classcodes as the being the results of the classification to a coefficientmemory 124.

[0185] If the class taps are formed from the L, G, I and A codes,residual signals and the linear prediction coefficients, andclassification is carried out based on these class taps, the number ofthe classes obtained on classification tends to be enormous. Thus, it isalso possible for the classification unit 123 to output the codes,obtained on vector quantization of the vectors having the L, G, I and Acodes, residual signals and the linear prediction coefficients, ascomponents, as being the results of the classification.

[0186] The coefficient memory 124 memorizes the class-based tapcoefficients, obtained on learning by the learning device of FIG. 12, aslater explained, and routes the tap coefficients, stored in the addressassociated with the class code output by the classification unit 123, tothe prediction unit 125.

[0187] Similarly to the prediction unit 49 of FIG. 3 the prediction unit125 acquires the prediction taps, output by the tap generator 121, andtap coefficients, output by the coefficient memory 124, and performs thelinear predictive calculations of the equation (6), using the predictiontaps and the tap coefficients. The prediction unit 125 finds the speechof high sound quality of the frame of interest, more precisely,prediction values thereof, and performs the linear predictivecalculations shown in the equation (6). In this manner, the predictionunit 125 finds the speech of high sound quality of the frame ofinterest, more precisely, prediction values thereof, and sends the sofound out values as being the result of speech decoding to a D/Aconverter 30.

[0188] The receipt unit 114, designed as described above, performs theprocessing basically the same as the processing complying with theflowchart of FIG. 5 to output the synthesized sound of high soundquality as being the result of speech decoding.

[0189] That is, the channel decoder 21 separates the L, G, I and Acodes, from the code data, supplied thereto, to send the so separatedcodes to the adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and to the filter coefficientdecoder 25, respectively. The L, G, I and A codes are also sent to thetap generator 122.

[0190] The adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and the operating units 26 to 28perform the processing similar to that performed in the adaptivecodebook storage unit 9, gain decoder 10, excitation codebook storageunit 11 and in the operating units 12 to 14 of FIG. 1 to decode the L, Gand I codes to residual signals e. These residual signals are routes tothe speech synthesis unit 29 and to the tap generator 122.

[0191] As explained with reference to FIG. 1, the filter coefficientdecoder 25 decodes the A codes, supplied thereto, into linear predictioncoefficients, which are routed to the speech synthesis unit 29 an to thetap generator 122. Using the residual signals from the operating unit 28and the linear prediction coefficients supplied from the filtercoefficient decoder 25, the speech synthesis unit 29 synthesizes thespeech, and sends the resulting synthesized sound to the tap generator121.

[0192] Using a frame of the synthesized sound, output from the speechsynthesis unit 29, as the frame of interest, the tap generator 121 atstep S1 generates prediction taps, from the synthesized sound of theframe of interest, and sends the so generated prediction taps to theprediction unit 125. At step S1, the tap generator 122 generates classtaps, from the L, G, I and A codes, residual signals and the linearprediction coefficients, supplied thereto, and sends these to theclassification unit 123.

[0193] The program then moves to step S2 where the classification unit123 carries out the classification based on the class taps sent from thetap generator 122 to send the resulting class codes to theclassification unit 124. The program then moves to step S3.

[0194] At step S3, the coefficient memory 124 reads out tapcoefficients, corresponding to the class codes, supplied form theclassification unit 123, to send the so read out tap coefficients to theprediction unit 125.

[0195] The program moves to step S4 where the prediction unit 125acquires tap coefficients for the residual signals output by thecoefficient memory coefficient memory 124, and carries outsum-of-products processing in accordance with the equation (6), usingthe tap coefficients and the prediction taps from the tap generator 121,to acquire prediction values of the speech of high sound quality of theframe of interest.

[0196] The speech of high sound quality, obtained as described above, issent from the prediction unit 125 through the D/A converter 30 to theloudspeaker 31 which then outputs the speech of the high sound quality.

[0197] After the processing at step S4, the program moves to step S5where it is verified whether or not there is any frame to be processedas the frame of interest. If it is found that there is any such frame,the program reverts to step S1, where the similar processing is repeatedwith the frame to be the next frame of interest as being the new frameof interest. If it is found at step S5 that there is no frame to beprocessed as being the frame of interest, the processing is terminated.

[0198]FIG. 12 shows an instance of a learning device adapted forcarrying out the processing of learning tap coefficients memorized inthe coefficient memory 124 of FIG. 11.

[0199] In the learning device of FIG. 12, the components from amicrophone 201 to a code decision unit 215 are constructed similarly tothe microphone 1 to the code decision unit 15 of FIG. 1. The microphone1 is fed with speech signals for learning. So, the components from amicrophone 201 to a code decision unit 215 perform the same processingon the speech signals for learning as that in FIG. 1.

[0200] A tap generator 131 is fed with the synthesized sound output by aspeech synthesis filter 206 when a minimum square error decision unit208 has verified the square error to be smallest. Meanwhile, a tapgenerator 132 is fed with the L, G, I and A codes output when thedefinite signal has been received by the code decision unit 215 from theminimum square error decision unit 208. The tap generator 132 is alsofed with the linear prediction coefficients, as components of codevectors (centroid vectors) corresponding to the A code as the results ofvector quantization of the linear prediction coefficients obtained at anLPC analysis unit 204, output by the vector quantizer 205, and withresidual signals output by the operating unit 214, that prevail when thesquare error in the minimum square error decision unit 208 has becomeminimum. A normal equation summation circuit 134 is fed with speechoutput by an A/D converter 202 as teacher data.

[0201] From the synthesized sound, output by a speech synthesis filter206, the tap generator 131 generates the same prediction taps as thoseof the tap generator 121 of FIG. 1, and routes the so generatedprediction taps as pupil data to the normal equation summation circuit134.

[0202] From the L, G, I sans A codes from the code decision unit 215,linear prediction coefficients, issued by the vector quantizer 205, fromthe residual signals and from the operating unit 214, the tap generator132 forms the same class taps as those of the tap generator 122 of FIG.11 to send the so formed class taps to the classification unit 133.

[0203] Based on the class taps from the tap generator 132, aclassification unit 133 carries out the same classification as thatperformed by the classification unit 123 and routes the resulting classcode to the normal equation summation circuit 134.

[0204] The normal equation summation circuit 134 receives the speechfrom the A/D converter 202 as teacher data, while receiving theprediction taps from the tap generator 131 as pupil data. The normalequation summation circuit 134 then performs the similar summation tothat performed by the normal equation addition circuit 81 of FIG. 6 toestablish the normal equation shown as in the equation (13) for eachclass.

[0205] A tap coefficient decision circuit 135 solves the normalequation, generated in the normal equation addition circuit 134 fromclass to class, to find tap coefficients for the respective classes. Thetap coefficients, thus found, are sent to the address associated witheach class of a coefficient memory 136.

[0206] Depending on the speech signals, provided as speech signals forlearning, there are occasions wherein, in a class or classes, a numberof the normal equations required to find tap coefficients cannot beproduced in the normal equation addition circuit 134. For suchclass(es), the tap coefficient decision circuit 135 outputs default tapcoefficients.

[0207] The coefficient memory 136 memorizes the class-based linearprediction coefficients and residual signals, supplied from the tapcoefficient decision circuit 135

[0208] The above-described learning device basically performs theprocessing similar to that conforming to the flowchart shown in FIG. 8to find tap coefficients for producing the synthesized sound of highsound quality.

[0209] The learning device is fed with speech signals for learning. Atstep S11, teacher data and pupil data are generated from the speechsignals for learning.

[0210] That is, the speech signals for learning are fed to themicrophone 201. The components from the microphone 201 to the codedecision unit 215 perform the processing similar t o that performed bythe components from the microphone 1 to the code decision unit 15 ofFIG. 1.

[0211] The result is that the speech of the digital signals, obtained bythe A/D converter 202, are sent as teacher data to the normal equationsummation circuit 134. If it is verified that the square error hasbecome smallest in the minimum square error decision unit 208, thesynthesized sound, output by the speech synthesis filter 206, is sent aspupil data to the tap generator 131.

[0212] When the linear prediction coefficients output by the vectorquantizer 205 are such that the square error as found by the minimumsquare error decision unit 208 is minimum, the L, G, I and A codes,output by the code decision unit 215, and the residual signals output bythe operating unit 214, are sent to the tap generator 132.

[0213] The program then moves to step S12 where the tap generator 131generates prediction taps from the synthesized sound of the frame ofinterest, with the frame of the synthesized sound supplied as pupil datafrom the speech synthesis filter 206 to send the so generated predictiontaps to the normal equation summation circuit 134. At step S12, the tapgenerator 132 generates class taps from the L, G, I and A codes, linearprediction coefficients and the residual signals, supplied thereto, tosend the so generated class taps to the classification unit 133.

[0214] After the processing at step S12, the program moves to step S13where the classification unit 133 performs classification based on theclass taps from the tap generator 132 to send the resulting class codesto the normal equation summation circuit 134.

[0215] The program then moves to step S14 where the normal equationsummation circuit 134 performs the aforementioned summation of thematrix A and the vector v of the equation (13), for the speech signalsfor learning, as the speech of the high sound quality of the frame ofinterest from the A/D converter 202, as teacher data and for predictiontaps from the tap generator 132, as pupil data, from one class code fromthe classification unit 133 to another. The program then moves to stepS15.

[0216] At step S15, it is verified whether or not there is any frame tobe processed as the frame of interest. If it is found at step S15 thatthere is still a frame to be processed as the frame of interest, theprogram reverts to step S11 where the processing similar to thatdescribed above is repeated with the sequentially next frame as beingnew frames of interest.

[0217] If it is found at step S15 that there is no frame to be processedas being the frame of interest, that is if the normal equation has beenobtained for each class in the normal equation summation circuit 134,the program moves to step S16 where the tap coefficient decision circuit135 solves the normal equation generated for each class to find the tapcoefficients from class to class to send the so found tap coefficientsto the address associated with each class to terminate the processing.

[0218] The class-based tap coefficients stored in the coefficient memory136 are stored in the coefficient memory coefficient memory 124 of FIG.11.

[0219] Consequently, the tap coefficients stored in the coefficientmemory 124 of FIG. 11 have been found by carrying out the learning suchthat the prediction errors (square errors) of the predicted speechvalues of high sound quality obtained on linear predictive calculationswill be statistically minimum, so that the speech output by theprediction unit 125 of FIG. 11 is of high sound quality.

[0220] The above-described sequence of operations may be carried out byhandwave or by software. If the sequence of operations is carried out bysoftware, the program forming the software is installed on e.g.,general-purpose computer.

[0221]FIG. 13 shows an illustrative structure of an embodiment of acomputer on which to install the program adapted for executing theabove-described sequence of operations.

[0222] It is possible for the program to be pre-recorded on a hard disc305 or a ROM 303 as a recording medium enclosed in a computer.

[0223] Alternatively, the program may be transiently or permanentlystored in a removable recording medium 311, such as CD-ROM (Compact DiscRead Only memory), MO (magneto-optical) disc, DVD (Digital VersatileDisc), magnetic disc or a semiconductor memory. Such removable recordingmedium 311 may be furnished as a so-called package software.

[0224] Meanwhile, the program may not only be installed from theabove-described removable recording medium 311 on a computer but alsotransferred over a radio route to the computer from a downloading site,over a network, such as LAN (Local Area network) or Internet. The sotransferred program on a communication unit 308 may be received by thecommunication unit 308 so as to be installed on an enclosed hard disc305.

[0225] The computer has enclosed therein a CPU (central processing unit)302. To this CPU 302 is connected an input/output interface 310 over abus 301. When a command is input to the CPU 302 over the input/outputinterface 310 by a user acting on an input unit 307, such as a keyboard,mouse or microphone, the program loaded on the ROM (Read Only Memory) isexecuted. Alternatively, the CPU 302 loads a program, stored in the harddisc 305, a program transmitted over the satellite or network, receivedby a communication unit 308 and installed on the hard disc 305, or aprogram read out from the removable recording medium 311 loaded on thehard disc 305, on a RAM (Random Access memory) 304 for execution. TheCPU 302 now executes the processing in accordance with theabove-described flowchart or the processing conforming to theabove-described block diagram. The CPU 302 causes the processing resultsto be output over e.g., the input/output interface 310 from an outputunit 306 formed by LCD (liquid crystal display) or a loudspeaker,transmitted from the communication unit 308 or recorded on the hard disc305.

[0226] The processing step for stating the program for executing thevarious processing operations by a computer need not be carried outchronologically in the order stated in the flowchart, but may beprocessed in parallel or batch-wise, such as parallel processing orobject-wise processing.

[0227] The program may be processed by a sole computer or by pluralcomputers in a distributed fashion. Moreover, the program may betransmitted to a remotely located computer for execution.

[0228] Although no particular reference has been made in the presentinvention as to which sort of the speech signals for learning is to beused, the speech signals for learning may not only be the speech utteredby a speaker or a musical number (music). With the above-describedlearning, such tap coefficients which will improve the sound quality ofthe speech are obtained if the speech uttered by a speaker is used,whereas, if the speech signals for learning are music numbers, such tapcoefficients which will improve the sound quality of the speech areobtained which will improve the sound quality of the musical number.

[0229] In an embodiment shown in FIG. 11, the tap coefficients arepre-stored in the coefficient memory 124. Alternatively, the tapcoefficients to be stored in the coefficient memory 124 may also bedownloaded in the portable telephone set 101 from the base station 102or the exchange station 103 of FIG. 9 or from a WWW (World Wide Web)server, not shown. That is, the tap coefficients suited to a sort ofspeech signals, such as those for the human speech or music, may beobtained on learning. Depending on the teacher or pupil data used forlearning, such tap coefficients which will produce a difference in thesound quality of the synthesized sound may be acquired. So, thesevarious tap coefficients may be stored in e.g., the base station 102 forthe user to download the tap coefficients the or she desires. Suchservice of downloading the tap coefficients may be payable orcharge-free. If the service of downloading the tap coefficients is to bepayable, the fee as remuneration for the downloaded tap coefficients maybe charged along with the call toll of the portable telephone set 101.

[0230] The coefficient memory coefficient memory 124 may be formed bye.g., a memory card that can be mounted on or dis mounted from theportable telephone set 101. If, in this case, variable memory cardshaving stored thereon the above-described various tap coefficients arefurnished, the memory cards holding the desired tap coefficients may beloaded and used on the portable telephone set 101.

[0231] The present invention may be broadly applied in generating thesynthesized sound from the code obtained on encoding by the CELP system,such as VSELP (Vector Sum Excited linear Prediction), PSI-CELP (PitchSynchronous Innovation CELP), CS-ACELP (Conjugate Structure AlgebraicCELP).

[0232] The present invention also is broadly applicable not only to sucha case where the synthesized sound is generated from the code obtainedon encoding by CELP system but also to such a case where residualsignals and linear prediction coefficients are obtained from a givencode to generate the synthesized sound.

[0233] In the above-described embodiment, the prediction values ofresidual signals and linear prediction coefficients are found byone-dimensional linear predictive calculations. Alternatively, theseprediction values may be found by two-or higher dimensional predictivecalculations.

[0234] Also, in the receipt unit shown in FIG. 11 and in the learningdevice shown in FIG. 12, the class taps are generated based not only onthe L, G, I and A codes, but also on linear prediction coefficientsderived from the A codes and residual signals derived from the L, G andI codes. The class codes may also be generated from only one or a pluralnumber of the L, G, I and A codes, such as, for example, from only the Acode. If, for example, the class taps are formed only from the I code,the I code it self may be used as the class code. Since the VSELP systemallocates 9 bits to the I code, the number of the classes is 512 (=2⁹)if the I code is directly used as the class code. Meanwhile, each bit ofthe 9-bit I code has two sorts of signs, namely 1 and −1, it issufficient if a bit which is −1 is deemed to be 0 if this I code is usedas the class code.

[0235] In the CELP system, software interpolation bits or the frameenergy may sometimes be included in the code data. In this case, theclass taps may be formed by using software interpolation bits or theframe energy.

[0236] In Japanese Laying-Open Patent Publication H-8-202399, there isdisclosed a method of passing the synthesized sound through a high rangeemphasizing filter to improve its sound quality. The present inventiondiffers from the invention disclosed in the Japanese Laying-Open PatentPublication H-8-202399 e.g., in that the tap coefficients are obtainedon learning and in that the tap coefficients used are determined fromthe results of the code-based classification.

[0237] Referring to the drawings, a modification of the presentinvention is explained in detail.

[0238]FIG. 14 shows a structure of a speech synthesis device embodyingthe present invention. This speech synthesis device is fed with codedata multiplexed from the residual code and the A code obtainedrespectively on coding the residual signal and the linear predictioncoefficients A sent to a speech synthesis filter 147. The residualsignals and the linear prediction coefficients are found from theresidual and A codes, respectively, and routed to the speech synthesisfilter 147 to generate the synthesized sound.

[0239] If the residual code is decoded into the residual signals basedon the codebook which associates the residual signals with the residualcode, the residual signals, obtained on decoding, are corrupted witherrors, with the result that the synthesized sound is deteriorated insound quality. Similarly, if the A code is decoded into linearprediction coefficients based on the codebook which associates thelinear prediction coefficients with the A code, the decoded linearprediction coefficients are again corrupted with errors, thusdeteriorating the sound quality of the synthesized sound.

[0240] So, in the speech synthesis device of FIG. 14, the predictivecalculations are carried out using tap coefficients as found on learningto find prediction values for true residual signals and linearprediction coefficients and the synthesized sound of high sound qualityis produced using these prediction values.

[0241] That is, in the speech synthesis device of FIG. 14, the linearprediction coefficients decoded are decoded to prediction values of truelinear prediction coefficients using e.g., the classification adaptiveprocessing.

[0242] The classification adaptive processing is made up byclassification processing and adaptive processing. By the classificationprocessing, the data is classified depending on data properties andadaptive processing is carried out from class to class, while theadaptive processing is carried out by a technique which is the same asthat described above. So, reference may be had to the foregoingdescription, and detailed description is not made here for simplicity.

[0243] In the speech synthesis device, shown in FIG. 14, the decodedlinear prediction coefficients are decoded into true linear predictioncoefficients, more precisely prediction values thereof, whilst decodedresidual signals are also decoded into true residual signals, moreprecisely prediction values thereof.

[0244] That is, a demultiplexer (DEMUX) 141 is fed with code data andseparates the code data supplied into frame-based A code and residualcode, which are routed to a filter coefficient decoder 142A and aresidual codebook storage unit 142E, respectively. It should be notedthat the A code and the residual code, included in the code data in FIG.14, are obtained on vector quantization of linear predictioncoefficients and residual signals, obtained in turn on LPC analysis ofthe speech in terms of a preset frame as unit, using a preset codebook.

[0245] The filter coefficient decoder 142A decodes the frame-based Acode, supplied from the demultiplexer 141, into decoded linearprediction coefficients, based on the same codebook as that used inobtaining the A code, to route the resulting decoded linear predictioncoefficients to the tap generator 143A.

[0246] The residual codebook storage unit 142E memorizes the samecodebook as that used in obtaining the frame-based residual code,supplied from the demultiplexer 141, and decodes the residual code fromthe demultiplexer into the decoded residual signals, based on thecodebook, to route the so produced decoded residual signals to the tapgenerator 143E.

[0247] From the frame-based decoded linear prediction coefficients,supplied from the filter coefficient decoder 142A, the tap generator143A extracts what are to be class taps used in classification in aclassification unit 144A, and what are to be prediction taps used inpredictive calculations in a prediction unit 146, as later explained.That is, the tap generator 143A sets the totality of the decoded linearprediction coefficients as prediction taps and class taps for the linearprediction coefficients. The tap generator 143A sends the class tapspertinent to the linear prediction coefficients and the prediction tapsto the classification unit 144A and to the prediction unit 146A,respectively.

[0248] From the frame-based decoded residual signals, the tap generator143E extracts what are to be class taps and what are to be predictiontaps from the frame-based decoded residual signals supplied from theresidual codebook storage unit 142E. That is, the tap generator 143Emakes all sample values of the decoded residual signals of a frame beingprocessed into class taps and prediction taps for the residual signals.The tap generator 143E sends class taps pertinent to the residualsignals and prediction taps to the classification unit 144E and to theprediction unit 146E, respectively.

[0249] The constituent pattern of the prediction taps and class taps arenot limited to the above-mentioned patterns.

[0250] It should be noted that the may be designed to extract class tapsand prediction taps of the linear prediction coefficients from both thedecoded linear prediction coefficients and the decoded residual signals.The class taps and prediction patterns pertinent to the linearprediction coefficients may also be extracted by the tap generator 143Afrom the A code and the residual code. The class taps and predictionpatterns of the linear prediction coefficients may also be extractedfrom signals already output from the downstream side prediction units146A or 146E or from the synthesized speech signals already output bythe speech synthesis filter 147. It is also possible for the tapgenerator 143E to extract class and prediction taps pertinent to theresidual signals in similar manner.

[0251] Based on the class taps pertinent to the linear predictioncoefficients from the tap generator 143A, the classification unit 144Aclassifies the linear prediction coefficients of the frame, which is aframe of interest, and the prediction values of true linear predictioncoefficients of which are to be found, and outputs the class code,corresponding to the resulting class, to a coefficient memory 145A.

[0252] As the method for classification, ADRC (Adaptive Dynamic RangeCoding), for example, may be employed.

[0253] In a method employing the ADRC, the decoded linear predictioncoefficients forming class taps, are ADRC processed and, based on theresulting ADRC code, the class of the linear prediction coefficients ofthe frame of interest is determined.

[0254] In a K-bit ADRC, the maximum value MAX and the minimum value MINof decoded linear prediction coefficients, forming class taps, aredetected based on a local dynamic range of a set DR=MAX−MIN, and thedecoded linear prediction coefficients, forming the class taps, arere-quantized into K bits. That is, the minimum value MIN is subtractedfrom the decoded linear prediction coefficients, forming the class taps,and the resulting difference value is divided by DR/2K. The respectivedecoded linear prediction coefficients, forming the class taps, obtainedas described above, are arrayed in a preset sequence to form a bitstring, which is output as an ADRC code. Thus, if the class taps areprocessed with e.g., one-bit ADRC, the minimum value MIN is subtractedfrom the respective decoded linear prediction coefficients, forming theclass taps, and the resulting difference value is divided by the averagevalue of the maximum value MAX and the minimum value MIN, whereby therespective decoded linear prediction coefficients are of one-bit values,by way of binary coding. The bit string, obtained on arraying theone-bit decoded linear prediction coefficients, is output as the ADRCcode.

[0255] The string of values of decoded linear prediction coefficients,forming class taps, may directly be output as the class code to theclassification unit 144A. If the class taps are formed as p-dimensionallinear prediction coefficients, and K bits are allocated to therespective decoded linear prediction coefficients, the number ofdifferent class codes, output by the classification unit 144A, is(2^(K))^(k) which is an extremely large value exponentiallyproportionate to the number of bits K of the decoded linear predictioncoefficients.

[0256] Thus, classification in the classification unit 144A ispreferably carried out after compressing the information volume of theclass taps by e.g., the ADRC processing or vector quantization.

[0257] Similarly to the classification unit 144A, the classificationunit 144E carries out classification of the frame of interest, based onthe class taps supplied from the tap generator 143E, to output theresulting class codes to the coefficient memory 145E.

[0258] The coefficient memory 145E holds tap coefficients pertinent tothe class-based linear prediction coefficients, obtained on performingthe learning in a learning device of FIG. 17 as later explained, andoutputs the tap coefficients, stored in an address associated with theclass code output by the classification unit 144A, to the predictionunit 146A.

[0259] The coefficient memory 145E holds tap coefficients pertinent tothe class-based linear prediction coefficients, as obtained by carryingout the learning in the learning device of FIG. 17, and outputs the tapcoefficients, stored in the address corresponding to the class codeoutput by the classification unit 144E, to the prediction unit 146E.

[0260] If, in case p-dimensional linear prediction coefficients are tobe found in each frame, the p-dimensional linear prediction coefficientsare to be found by predictive calculations of the aforementionedequation (6), p sets of the tap coefficients are needed. Thus, in thecoefficient memory 145A, p sets of the tap coefficients are stored in anaddress associated with one class code. For the same reason, the samenumber of sets as that of the sample points of the residual signals ineach frame is stored in the coefficient memory 145E.

[0261] The prediction unit 146A acquires prediction taps output by thetap generator 143A and the tap coefficients output by the coefficientmemory 145A and, using these prediction and tap coefficients, performsthe linear prediction calculations (sum-of-product processing), shown bythe equation (6), to find the p-dimensional linear predictioncoefficients of the frame of interest, more precisely the predictedvalues thereof, to send the so found out values to the speech synthesisfilter 147.

[0262] The prediction unit 146E acquires the prediction taps, output bythe tap generator 143E, and the tap coefficients output by thecoefficient memory 145E. Using the so acquired prediction and tapcoefficients, the prediction unit 146E carries out the linear predictioncalculations, shown by the equation (6), to find predicted values of theresidual signals of the frame of interest to output the so found outvalues to the speech synthesis filter 147.

[0263] The coefficient memory 145A outputs P sets of tap coefficientsfor finding predicted values of the p-dimensional linear predictioncoefficients forming the frame of interest. On the other hand, theprediction unit 146A executes the sum-of-products processing of theequation (6), using the prediction taps, and the sets of the tapcoefficients corresponding to the number of the dimensions, in order tofind the linear prediction coefficients of the respective dimensions.The same holds for the prediction unit 146E.

[0264] Similarly to the speech synthesis unit 29, explained withreference to FIG. 1, the speech synthesis filter 147 is an IIR typedigital filter, and carries out the filtering of the residual signalsfrom the prediction unit 146E as input signal, with the linearprediction coefficients from the prediction unit 146A as tapcoefficients of the IIR filter, to generate the synthesized sound, whichis input to a D/A converter 148. The D/A converter 148 D/A converts thesynthesized sound from the speech synthesis filter 147 from the digitalsignals into the analog signals, which are sent to and output at aloudspeaker 149.

[0265] In FIG. 14, class taps are generated in the tap generators 143A,143E, classification based on these class taps is carried out in theclassification units 144A, 144E and tap coefficients for the linearprediction coefficients and the residual signals corresponding to theclass codes as being the results of the classification are acquired fromthe coefficient memories 145A, 145E. Alternatively, the tap coefficientsof the linear prediction coefficients and the residual signals can beacquired as follows:

[0266] That is, the tap generators 143A, 143E, classification units144A, 144E and the coefficient memories 145A, 145E are constructed asrespective integral units. If the tap generators, classification unitsand the coefficient memories, constructed as respective integral units,are named a tap generator 143, a classification unit 144 and acoefficient memory 145, respectively, the tap generator 143 is caused toform class taps from the decoded linear prediction coefficients anddecoded residual signals, while the classification unit 144 is caused toperform classification based on the class taps to output one class code.The coefficient memory 145 is caused to hold sets of tap coefficientsfor the decoded linear prediction coefficients and tap coefficients forthe residual signals, and is caused to output sets of the tapcoefficients for each of the linear prediction coefficients and theresidual signals stored in the address associated with the class codeoutput by the classification unit 144. The prediction units 146A, 146Emay be caused to carry out the processing based on the tap coefficientspertinent to the linear prediction coefficients output as sets from thecoefficient memory 145 and on the tap coefficients for the residualsignals.

[0267] If the tap generators 143A, 143E, classification units 144A, 144Eand the coefficient memories 145A, 145E are constructed as respectiveseparate units, the number of classes for the linear predictioncoefficients is not necessarily the same as the number of classes forthe residual signals. In case of construction as the integral units, thenumber of the classes of the linear prediction coefficients is the sameas that of the residual signals.

[0268]FIG. 15 shows a specified structure of the speech synthesis filter147 making up the speech synthesis device shown in FIG. 14.

[0269] The speech synthesis filter 147 uses the p-dimensional linearprediction coefficients, as shown in FIG. 15, and hence is made up by asole adder 151, p delay circuits (D) 152 ₁ to 152 _(p) and p multipliers153 ₁ to 153 _(p).

[0270] In the multipliers 153 ₁ to 153 _(p) are set p-dimensional linearprediction coefficients α₁, α₂, . . . , α_(p), supplied from theprediction unit 146A, whereby the speech synthesis filter 147 performscalculations in accordance with the equation (4) to generate thesynthesized sound.

[0271] That is, the residual signals, output by the prediction unit146E, are sent to a delay circuit 152 ₁ through adder 151. The delaycircuit 152 _(p) delays the input signal by one sample of the residualsignals to output the delayed signal to the downstream side delaycircuit 152 _(p+1) and to the multiplier 153 _(p). The multiplier 153_(p) multiplies the output of the delay circuit 12 _(p) with the linearprediction coefficient α_(p) set thereat to send the resulting productvalue to the adder 151.

[0272] The adder 151 sums all outputs of the multipliers 153 ₁ to 153_(p) and the residual signals e to send the resulting sum to the delaycircuit 12 ₁ and to output the sum as the result of speech synthesis(resulting sound signal).

[0273] Referring to the flowchart of FIG. 16, the speech synthesisprocessing of FIG. 14 is explained.

[0274] The demultiplexer 141 sequentially separates frame-based A codeand residua code, from the code data, supplied thereto, to send theseparated codes to the filter coefficient decoder 142A and to theresidual codebook storage unit 142E.

[0275] The filter coefficient decoder 142A sequentially decodes theframe-based A code, supplied from the demultiplexer 141, into decodedlinear prediction coefficients, which are supplied to the tap generator143A. The residual codebook storage unit 142E sequentially decodes theframe-based residual codes, supplied from the demultiplexer 141, intodecoded residual signals, which are sent to the tap generator 143E.

[0276] The tap generator 143A sequentially renders the frames of thedecoded linear prediction coefficients supplied thereto the frames ofinterest. The tap generator 143A at step S101 generates the class tapsand the prediction taps from the decoded linear prediction coefficientssupplied from the filter coefficient decoder 142A. At step S101, the tapgenerator 143E also generates class taps and prediction taps from thedecoded residual signals supplied from the residual codebook storageunit 142E. The class taps generated by the tap generator 143A aresuppled to the classification unit 144A, while the prediction taps aresent to the prediction unit 146A. The class taps generated by the tapgenerator 143E are sent to the classification unit 144E, while theprediction taps are sent to the prediction unit 146E.

[0277] At step S102, the classification units 144A, 144E performclassification based on the class taps supplied from the tap generators143A, 143E and sends the resulting class codes to the coefficientmemories 145A, 145E. The program then moves to step S103.

[0278] At step S103, the coefficient memories 145A, 145E read out tapcoefficients from the addresses for the class codes sent from theclassification units 144A, 144E to send the read out coefficients to theprediction units 146A, 146E.

[0279] The program then moves to step S104, where the prediction unit146A acquires the tap coefficients output by the coefficient memory 145Aand, using these tap coefficients and the prediction taps from the tapgenerator 143A, acquires the prediction values of the true linearprediction coefficients of the frame of interest. At step S104, theprediction unit 146E acquires the tap coefficients output by thecoefficient memory 145E and, using the tap coefficients and theprediction taps from the tap generator 143E, performs thesum-of-products processing shown by the equation (6) to acquire the trueresidual signals of the frame of interest, more precisely predictedvalues thereof.

[0280] The residual signals and the linear prediction coefficients,obtained as described above, are sent to the speech synthesis filter147, which then performs the calculations of the equation (4), using theresidual signals and the linear prediction coefficients, to produce thesynthesized sound signal of the frame of interest. The synthesized soundsignal is sent from the speech synthesis filter 147 through the D/Aconverter 148 to the loudspeaker 149 which then outputs the synthesizedsound corresponding to the synthesized sound signal.

[0281] After the linear prediction coefficients and the residual signalshave been obtained in the prediction units 146A, 146E, the program movesto step S105 where it is verified whether or not there are any decodedlinear prediction coefficients and the decoded residual signals to beprocessed as the frame of interest. If it is verified at step S105 thatthere are any decoded linear prediction coefficients and the decodedresidual signals to be processed as the frame of interest, the programreverts to step S101 where the frame to be rendered the frame ofinterest next is rendered the new frame of interest. The similarsequence of operations is then carried out. If it is verified at stepS105 that there are no decoded linear prediction coefficients nordecoded residual signals to be processed as the frame of interest, thespeech synthesis processing is terminated.

[0282] The learning device for carrying out the tap coefficients to bestored in the coefficient memories 145A, 145E shown in FIG. 14 isconfigured as shown in FIG. 17.

[0283] The learning device, shown in FIG. 17, is fed with the digitalspeech signals for learning, on the frame basis. These digital speechsignals for learning are sent to an LPC analysis unit 161A and to aprediction filter 161E.

[0284] The LPC analysis unit 161A sequentially renders the frames of thespeech signals, supplied thereto, the frames of interest, andLPC-analyzes the speech signals of the frame of interest to findp-dimensional linear prediction coefficients. These linear predictioncoefficients are sent to a prediction unit 161E and to a vectorquantizer 162A, while being sent to a normal equation addition circuit166A as teacher data for finding tap coefficients pertinent to thelinear prediction coefficients.

[0285] The prediction filter 161E performs calculations in accordancewith the equation (1), using the speech signals and the linearprediction coefficients, supplied thereto, to find residual signals ofthe frame of interest, to send the resulting signals to the vectorquantizer 162E, as well as to send the residual signals to the normalequation addition circuit 166E as teacher data for finding tapcoefficients pertinent to the linear prediction coefficients.

[0286] That is, if the Z-transforms of s_(n) and e_(n) in the equation(1) are represented by S and E, respectively the equation (1) may berepresented by:

E=(1+α₁ z ⁻¹+α₂ z ⁻²+ . . . +α_(p) z ^(−p))S.  (15)

[0287] From the equation (15), the residual signals e can be found bythe sum-of-products processing of the speech signal s and the linearprediction coefficients α_(p), so that the prediction filter 161E forfinding the residual signals e may be formed by an FIR (Finite ImpulseResponse) digital filter.

[0288]FIG. 18 shows an illustrative structure of the prediction filter161E.

[0289] The prediction filter 161E is fed with p-dimensional linearprediction coefficients from the LPC analysis unit 161A. So, theprediction filter 161E is made up of p delay circuits (D) 171 ₁ to 171_(p), p multipliers 172 ₁ to 172 _(p) and one adder 173.

[0290] In the multipliers 172 ₁ to 172 _(p) are set α₁, α₂, . . . ,α_(p) from among the p-dimensional linear prediction coefficients sentfrom the LPC analysis unit 161A.

[0291] The speech signals s of the frame of interest are sent to a delaycircuit 171 ₁ and to an adder 173. The delay circuit 171 _(p) delays theinput signal thereto by one sample of the residual signals to output thedelayed signal to the downstream side delay circuit 171 _(p+1) and tothe multiplier 172 _(p). The multiplier 172 _(p) multiplies the outputof the delay circuit 171 _(p) with the linear prediction coefficientα_(p) to send the resulting product to the adder 173.

[0292] The adder 173 sums all of the outputs of the multipliers 172 ₁ to172 _(p) to the speech signals s to output the results of summation asthe residual signals e.

[0293] Returning to FIG. 17, the vector quantizer 162A holds a codebookwhich associates the code vectors having the linear predictioncoefficients as components with the codes. Based on the codebook, thevector quantizer 162A vector-quantizes the feature vector constituted bylinear prediction coefficients of the frame of interest from the LPCanalysis unit 161A to route the code A obtained on the vectorquantization to a filter coefficient decoder 163A. The vector quantizer162A holds a codebook, which associates the code vectors, having thesample values of the signal of the vector quantizer 162 as components,with the codes, and vector-quantizes the residual vectors, formed bysample values of the residual signals of the frame of interest from theprediction filter 161E to route the residual code obtained on thisvector quantization to a residual codebook storage unit 163E.

[0294] The filter coefficient decoder 163A holds the same codebook asthat stored by the vector quantizer 162A and, based on this codebook,decodes the A code from the vector quantizer 162A into decoded linearprediction coefficients which then are sent to the tap generator 164A aspupil data used for finding the tap coefficients pertinent to the linearprediction coefficients. The residual codebook storage unit 142E shownin FIG. 14 is configured similarly to the filter coefficient decoder163A shown in FIG. 17.

[0295] The residual codebook storage unit 163E holds the same codebookas that stored by the vector quantizer 162E and, based on this codebook,decodes the residual code from the vector quantizer 162E into decodedresidual signals which then are sent to the tap generator 164E as pupildata used for finding the tap coefficients pertinent to the residualsignals. The residual codebook storage unit 142E shown in FIG. 14 isconfigured similarly to the residual codebook storage unit 142E shown inFIG. 17.

[0296] Similarly to the tap generator 143A of FIG. 14, the tap generator164A forms prediction taps and class taps, from the decoded linearprediction coefficients, supplied from the filter coefficient decoder163A, to send the class taps to a classification unit 165A, whilesupplying the prediction taps to the normal equation addition circuit166A. Similarly to the tap generator 143E of FIG. 14, the tap generator164E forms prediction taps and class taps, from the decoded residualsignals supplied from the residual codebook storage unit 163E, to sendthe class taps and the prediction taps to the classification unit 165Eand to the normal equation addition circuit 166E.

[0297] Similarly to the classification units 144A and 144E of FIG. 3,the classification units 165A and 165E perform classification based onthe class taps supplied thereto to send the resulting class codes to thenormal equation addition circuits 166A and 166E.

[0298] The normal equation addition circuit 166A executes summation onthe linear prediction coefficients of the frame of interest, as teacherdata from the LPC analysis unit 161A, and on the decoded linearprediction coefficients, forming prediction taps, as pupil data from thetap generator 164A. The normal equation addition circuit 166E executessummation on the residual signals of the frame of interest, as teacherdata from the prediction filter 161E, and on the decoded residualsignals, forming prediction taps, as pupil data from the tap generator164E.

[0299] That is, the normal equation addition circuit 166A uses the pupildata, as prediction taps and to perform calculations equivalent to thereciprocal multiplication of the pupil data (x_(in)x_(im)), as thecomponents of the matrix A of the above-mentioned equation (13), and tosummation (Σ), for each class supplied from the classification unit165A.

[0300] The normal equation addition circuit 166A also uses pupil data,that is linear prediction coefficients of the frame of interest, andteacher data, that is the decoded linear prediction coefficients,forming the prediction taps, and the linear prediction coefficients ofthe frame of interest, as teacher data, to perform multiplication(x_(in)y_(i)) of the pupil and teacher data, and to summation (Σ), foreach class of the class code supplied from the classification unit 165A.

[0301] The normal equation addition circuit 166A performs theaforementioned summation, with the totality of the frames of the linearprediction coefficients supplied from the LPC analysis unit 161A as theframes of interest, to establish the normal equation pertinent to thelinear prediction coefficients shown in FIG. 13.

[0302] The normal equation addition circuit 166E also performs similarsummation, with all of the frames of the residual signals sent form theprediction filter 161E as the frame of interest, whereby a normalequation concerning the residual signals as shown in equation (13) isestablished for each class.

[0303] A tap coefficient decision circuit 167A and a tap coefficientdecision circuit 167E solve the normal equations, generated in thenormal equation addition circuits 166A, 166E, from class to class, tofind tap coefficients for the linear prediction coefficients and for theresidual signals, which are sent to addresses associated with respectiveclasses of the coefficient memories 168A, 168E.

[0304] Depending on the speech signals, provided as speech signals forlearning, there are occasions wherein, in a class or classes, a numberof the normal equations required to find tap coefficients cannot beproduced in the normal equation addition circuit 166A or 166E. For suchclass(es), the tap coefficient decision circuit 167A or 167E outputsdefault tap coefficients.

[0305] The coefficient memories 168A, 168E memorize the class-based tapcoefficients and residual signals, supplied from the tap coefficientdecision circuits 167A, 167E.

[0306] Referring to the flowchart of FIG. 19, the processing forlearning of the learning device of FIG. 17 is explained.

[0307] The learning device is supplied with speech signals for learning.At step S111, teacher data and pupil data are generated from the speechsignals for learning.

[0308] That is, the LPC analysis unit 161A sequentially renders theframes of the speech signals for learning, the frame of interest, andLPC-analyzes the speech signals of the frame of interest to findp-dimensional linear prediction coefficients, which are sent as teacherdata to the normal equation addition circuit 166A. These linearprediction coefficients are also sent to the prediction filter 161E andto the vector quantizer 162A. This vector quantizer 162Avector-quantizes the feature vector formed by the linear predictioncoefficients of the frame of interest from the LPC analysis unit 161A tosend the A code obtained by this vector quantization to the filtercoefficient decoder 163A. The filter coefficient decoder 163A decodesthe A code from the vector quantizer 162A into decoded linear predictioncoefficients which are sent as pupil data to the tap generator 164A.

[0309] On the other hand, the prediction filter 161E, which has receivedthe linear prediction coefficients of the frame of interest from theanalysis unit 161A, performs the calculations conforming to theaforementioned equation (1), using the linear prediction coefficientsand the speech signals for learning of the frame of interest, to findthe residual signals of the frame of interest, which are sent to thenormal equation addition circuit 166E as teacher data. These residualsignals are also sent to the vector quantizer 162E. This vectorquantizer 162E vector-quantizes the residual vector, constituted bysample values of the residual signals of the frame of interest from theprediction filter 161E to send the residual code obtained as the resultof the vector quantization to the residual codebook storage unit 163E.The residual codebook storage unit 163E decodes the residual code fromthe vector quantizer 162E to form decoded residual signals, which aresent as pupil data to the tap generator 164E.

[0310] The program then moves to step S112 where the tap generator 164Aforms prediction taps and class taps pertinent to the linear predictioncoefficients, from the decoded linear prediction coefficients sent fromthe filter coefficient decoder 163A, whilst the tap generator 164E formsprediction taps and class taps pertinent to the residual signals fromthe decoded residual signals supplied from the residual codebook storageunit 163E. The class taps pertinent to the linear predictioncoefficients are sent to the classification unit 165A, whilst theprediction taps are sen to the normal equation addition circuit 166A.The class taps pertinent to the residual signals are sent to theclassification unit 165E, whilst the prediction taps are sen to thenormal equation addition circuit 166E.

[0311] Subsequently, at step S113, the classification unit 165A executesclassification based on the class taps pertinent to the linearprediction coefficients, and sends the resulting class codes to thenormal equation addition circuit 166A, whilst the classification unit165E executes classification based on the class taps pertinent to theresidual signals, and sends the resulting class code to the normalequation addition circuit 166E.

[0312] The program then moves to step S114, where the normal equationaddition circuit 166A performs the aforementioned summation of thematrix A and the vector v of the equation (13), for the linearprediction coefficients of the frame of interest as teacher data fromthe LPC analysis unit 161A and for the decoded linear predictioncoefficients forming the prediction taps as pupil data from the tapgenerator 164A. At step S114, the normal equation addition circuit 166Eperforms the aforementioned summation of the matrix A and the vector vof the equation (13), for the residual signals of the frame of interestas teacher data from the prediction filter 161E and for the decodedresidual signals forming the prediction taps as pupil data from the tapgenerator 164E. The program then moves to step S115.

[0313] At step S115, it is verified whether or not there is any speechsignal for learning for the frame to be processed as the frame ofinterest. If it is verified at step S115 that there is any speech signalfor learning of the frame to be processed as the frame of interest, theprogram reverts to step S111 where the next frame is set as a new frameof interest. The processing similar to that described above then isrepeated.

[0314] If it is verified at step S105 that there is no speech signal forlearning of the frame to be processed as the frame of interest, that isif the normal equation is obtained in each class in the normal equationaddition circuits 166A, 166E, the program moves to step S116 where thetap coefficient decision circuit 167A solves the normal equationgenerated for each class to find the tap coefficients for the linearprediction coefficients for each class. These tap coefficients are sentto the address associated with each class for storage therein. The tapcoefficient decision circuit 167E also solves the normal equationgenerated for each class to find the tap coefficients for the residualsignals for each class. These tap coefficients are sent to and stored inthe address associated with each class to terminate the processing.

[0315] The tap coefficients pertinent to the linear predictioncoefficients for each class, thus stored in the coefficient memory 168A,are stored in the coefficient memory 145A of FIG. 14, while the tapcoefficients pertinent to the class-based residual signals stored in thecoefficient memory 168E are stored in the coefficient memory 145E ofFIG. 14.

[0316] Consequently, the tap coefficients stored in the coefficientmemory 145A of FIG. 14 have been found on learning so that theprediction errors of the prediction value of the true linear predictioncoefficients, obtained on carrying out linear predictive calculations,herein square errors, will be statistically minimum, while the tapcoefficients stored in the coefficient memory 145E of FIG. 14 have beenfound on learning so that the prediction errors of the prediction valuesof the true residual signals, obtained on carrying out linear predictivecalculations, herein square errors, will also be statistically minimum.Consequently, the linear prediction coefficients and the residualsignals, output by the prediction units 146A, 146E of FIG. 14, aresubstantially coincident with the true linear prediction coefficientsand with the true residual signals, respectively, with the result thatthe synthesized sound generated by these linear prediction coefficientsand residual signals are free of distortion and of high sound quality.

[0317] If, in the speech synthesis device, shown in FIG. 14, the classtaps and prediction taps for the linear prediction coefficients are tobe extracted by the tap generator 143A from both the decoded linearprediction coefficients and the decoded residual signals, it isnecessary to cause the tap generator 164A of FIG. 17 to extract theclass taps or prediction taps for the linear prediction coefficientsfrom both the decoded linear prediction coefficients and from thedecoded residual signals. The same holds for the tap generator 164E.

[0318] If, in the speech synthesis device shown in FIG. 14, the tapgenerators 143A, 143E, classification units 144A, 144E and thecoefficient memories 145A, 145E are constructed as respective separateunits, the tap generators 164A, 164E, classification units 165A, 165E,normal equation addition circuits 166A, 166E, tap coefficient decisioncircuits 167A, 167E and the coefficient memories 168A, 168E need to beconstructed as respective separate units. In this case, in the normalequation addition circuit in which the normal equation addition circuits166A, 166E are constructed unitarily, the normal equation is establishedwith both the linear predictive coefficients output by the LPC analysisunit 161A and the residual signals output by the prediction units 161Eas teacher data at a time and with both the decoded linear predictivecoefficients output by the filter coefficient decoder 163A and thedecoded residual signals output by the residual codebook storage unit163E as pupil data at a time. In the tap coefficient decision circuitwhere the tap coefficient decision circuits 167A, 167E are constructedunitarily, the normal equation is solved to find the tap coefficientsfor the linear predictive coefficients and for the residual signals foreach class at a time.

[0319] An instance of the transmission system embodying the presentinvention the present invention is now explained with reference to FIG.20. The system herein means a set of logically arrayed plural devices,while it does not matter whether or not the respective devices are inthe same casing.

[0320] In this transmission system, the portable telephone sets 181 ₁,181 ₂ perform radio transmission and receipt with base stations 182 ₁,182 ₂, respectively, while the base stations 182 ₁, 182 ₂ perform speechtransmission and receipt with an exchange station 183 to enable speechtransmission and receipt of speech between the portable telephone sets181 ₁, 181 ₂ with the aid of the base stations 182 ₁, 182 ₂ and theexchange station 183. The base stations 182 ₁, 182 ₂ may be the same asor different from each other.

[0321] The portable telephone sets 181 ₁, 181 ₂ are referred to below asa portable telephone set 181, unless there is no particular necessityfor making distinctions between the two sets.

[0322]FIG. 21 shows an illustrative structure of the portable telephoneset 181 shown in FIG. 20.

[0323] An antenna 191 receives electrical waves from the base stations182 ₁, 182 ₂ to send the received signals to a modem 192 as well as tosend the signals from the modem 192 to the base stations 182 ₁, 182 ₂ aselectrical waves. The modem 192 demodulates the signals from the antenna191 to send the resulting code data explained in FIG. 1 to a receiptunit 194. The modem 192 also is configured for modulating the code datafrom the transmitter 193 as shown in FIG. 1 and sends the resultingmodulated signal to the antenna 191. The transmission unit 193 isconfigured similarly to the transmission unit shown in FIG. 1 and codesthe user's speech input thereto into code data which is sent to themodem 192. The receipt unit 194 receives the code data from the modem192 to decode and output the speech of high sound quality similar tothat obtained in the speech synthesis device of FIG. 14.

[0324] That is, FIG. 22 shows an illustrative structure of the receiptunit 194 of FIG. 21. In the drawing, parts or components correspondingto those shown in FIG. 2 are depicted by the same reference numerals andare not explained specifically.

[0325] The tap generator 101 is fed with frame-based or subframe-basedL, G and A codes, output by a channel decoder 21. The tap generator 101generates what are to be class taps, from the L, G, I and A codes, toroute the extracted class taps to a classification unit 104. The classtaps, constructed by e.g., records, generated by the tap generator 101,are sometimes referred to below as first class taps.

[0326] The tap generator 102 is fed with frame-based or subframe-basedresidual signals e, output by the operating unit 28. The tap generator102 extracts what are to be class taps (sample points) from the residualsignals to route the resulting class taps to the classification unit104. The tap generator 102 also extracts what are to be prediction tapsfrom the residual signals from the operating unit 28 to route theresulting prediction taps to the classification unit 106. The classtaps, constructed by e.g., residual signals, generated by the tapgenerator 102, are sometimes referred to below as second class taps.

[0327] The tap generator 103 is fed with frame-based or subframe-basedlinear prediction coefficients α₁, output by the filter coefficientdecoder 25. The tap generator 103 extracts what are to be class tapsfrom the linear prediction coefficients to route the resulting classtaps to the classification unit 104. The tap generator 103 also extractswhat are to be prediction taps from the linear prediction coefficientsfrom the filter coefficient decoder 25 to route the resulting predictiontaps to the prediction unit 107. The class taps, constructed by e.g.,the linear prediction coefficients, generated by the tap generator 103,are sometimes referred to below as third class taps.

[0328] The classification unit 104 integrates the first to third classtaps, supplied from the tap generators 101 to 103, to form ultimateclass taps. Based on these ultimate class taps, the classification unit104 performs the classification to send the class code as being theresult of the classification to the coefficient memory 105.

[0329] The coefficient memory 105 holds the tap coefficients pertinentto the class-based linear prediction coefficients and the tapcoefficients pertinent to the residual signals, as obtained by thelearning processing in the learning device of FIG. 23, as will beexplained subsequently. The coefficient memory 105 outputs the tapcoefficients stored in the address associated with the class code outputby the classification unit 104 to the prediction units 106 and 107.Meanwhile, tap coefficients We pertinent to the residual signals aresent from the coefficient memory 105 to the prediction unit 106, whiletap coefficients Wa pertinent to the linear prediction coefficients aresent from the coefficient memory 105 to the prediction unit 107.

[0330] Similarly to the prediction unit 146E, the prediction unit 106acquires the prediction taps output by the tap generator 102 and the tapcoefficients pertinent to the residual signals, output by thecoefficient memory 105, and performs the linear predictive calculationsof the equation (6), using the prediction taps and the tap coefficients.In this manner, the prediction unit 106 finds a predicted value em ofthe residual signals of the frame of interest to send the predictedvalue em to the speech synthesis unit 29 as an input signal.

[0331] Similarly to the prediction unit 146A of FIG. 14, the predictionunit 107 acquires the prediction taps output by the tap generator 103and tap coefficients pertinent to the linear prediction coefficientsoutput by the coefficient memory and, using the prediction taps and thetap coefficients, executes the linear predictive calculations of theequation (6). So, the prediction unit 107 finds a predicted value mα_(p)of the linear prediction coefficients of the frame of interest to sendthe so found out predicted value to the speech synthesis unit 29.

[0332] In the receipt unit 194, constructed as described above, theprocessing which is basically the same as the processing conforming tothe flowchart of FIG. 16 is carried out to output the synthesized speechof the high sound quality as being the result of the speech decoding.

[0333] That is, the channel decoder 21 separates the L, G, I and Acodes, from the code data, supplied thereto, to send the so separatedcodes to the adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and to the filter coefficientdecoder 25, respectively. The L, G, I and A codes are also sent to thetap generator 101.

[0334] The adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and the operating units 26 to 28perform the processing similar to that performed in the adaptivecodebook storage unit 9, gain decoder 10, excitation codebook storageunit 11 and in the operating units 12 to 14 of FIG. 1 to decode the L, Gand I codes to residual signals e. These residual signals are routedfrom the operating unit 28 and to the tap generator 102.

[0335] As explained with reference to FIG. 1, the filter coefficientdecoder 25 decodes the A codes, supplied thereto, into linear predictioncoefficients, which are routed to the tap generator 103.

[0336] The tap generator 101 renders the frames of the L, G, I and Acodes, supplied thereto, the frame of interest. At step S101 (FIG. 16),the tap generator 101 generates first class taps from the L, G, I and Acodes from the channel decoder 21 to send the so generated first classtaps to the classification unit 104. At step S101, the tap generator 102generates second class taps from the decoded residual signals from theoperating unit 28 to send the so generated second class taps to theclassification unit 104, while the tap generator 103 generates the thirdclass taps from the linear prediction coefficients from the filtercoefficient decoder 25 to send the so generated third class taps to theclassification unit 104. At step S101, the tap generator 102 generateswhat are to be prediction taps from the residual signals from theoperating unit 28 to send the prediction taps to the prediction unit106, while the tap generator 102 generates prediction taps from thelinear prediction coefficients from the filter coefficient decoder 25 tosend the so generated prediction taps to the prediction unit 107.

[0337] At step S102, the classification unit 104 executes classificationbased on ultimate class taps which have combined the first to thirdclass taps supplied from the tap generators 101 to 103 and sends theresulting class codes to the coefficient memory 105. The program thenmoves to step S103.

[0338] At step S103, the coefficient memory 105 reads out the tapcoefficients concerning the residual signals and the linear predictioncoefficients, from the address associated with the class code assupplied from the classification unit 104, and sends the tapcoefficients pertinent to the residual signals and the tap coefficientspertinent to the linear prediction coefficients to the prediction units106, 107, respectively.

[0339] At step S104, the prediction unit 106 acquires the tapcoefficients concerning the residual signals, output from thecoefficient memory 105, and executes the sum-of-products processing ofthe equation (6), using the so acquired tap coefficients and theprediction taps from the tap generator 102, to acquire predicted valuesof true residual signals of the frame of interest. At this step S104,the prediction unit 107 also acquires the tap coefficients pertinent tothe linear prediction coefficients output by the prediction unit 105and, using the so acquired tap coefficients and the tap coefficientsfrom the tap generator 103, performs the sum-of-products processing ofthe equation (6) to acquire predicted values of true linear predictioncoefficients of the frame of interest.

[0340] The residual signals and the linear prediction coefficients, thusacquired, are routed to the speech synthesis unit 29, which thenperforms the processing of the equation (4), using the residual signalsand the linear prediction coefficients, to generate the synthesizedsound signal of the frame of interest. These synthesized sound signalsare sent from the speech synthesis unit 29 through the D/A converter 30to the loudspeaker 31 which then outputs the synthesized soundcorresponding to the synthesized sound signals.

[0341] After the residual signals and the linear prediction coefficientshave been acquired by the prediction units 106, 107, the program movesto step S105 where it is verified whether or not there are yet L, G, Ior A codes of the frame to be processed as the frame of interest. If itis found at step S105 that there are as yet the L, G, I or A codes ofthe frame to be processed as the frame of interest, the program revertsto step S101 to set the frame to be the next frame of interest as thenew frame of interest to repeat the processing similar to that describedabove. If it is found at step S105 that there are no L, G, I or A codesof the frame to be processed as the frame of interest, the processing isterminated.

[0342] An instance of a learning device for performing the learningprocessing of tap coefficients to be stored in the coefficient memory105 shown in FIG. 22 is now explained with reference to FIG. 23. In thefollowing explanation, parts or components common to those of thelearning device shown in FIG. 12 are depicted by corresponding referencenumerals.

[0343] The components from the microphone 201 to the code decision unit215 are configured similarly to the components from the microphone 1 tothe code decision unit 15. The microphone 201 is fed with speech signalsfor learning, so that the components from the microphone 201 to the codedecision unit 215 perform the processing similar to that shown in FIG.1.

[0344] A prediction filter 111E is fed with speech signals for learning,as digital signals, output by the A/D converter 202, and with the linearprediction coefficients, output by the LPC analysis unit 204. The tapgenerator 112A is fed with the linear prediction coefficients, output bythe vector quantizer 205, that is linear prediction coefficients formingthe code vectors (centroid vector) of the codebook used for vectorquantization, while the tap generator 112E is fed with residual signalsoutput by the operating unit 214, that is the same residual signals asthose sent to the speech synthesis filter 206. The normal equationaddition circuit 114A is fed with the linear prediction coefficientsoutput by the LPC analysis unit 204, whilst the tap generator 117 is fedwith the L, G, I and A codes output by the code decision unit 215.

[0345] The prediction filter 111E sequentially sets the frames of thespeech signals for learning, sent from the A/D converter 202, andexecutes e.g., the processing complying with the equation (1), using thespeech signals for the frame of interest and the linear predictioncoefficients supplied from the LPC analysis unit 204, to find theresidual signals for the frame of interest. These residual signals aresent as teacher data to the normal equation addition circuit 114E.

[0346] From the linear prediction coefficients, supplied from the vectorquantizer 205, the tap generator 112A forms the same prediction taps asthose in the tap generator 103 of FIG. 11, and third class taps, androutes the third class taps to the classification units 113A, 113E,while routing the prediction taps to the normal equation additioncircuit 114A.

[0347] From the linear prediction coefficients, supplied from theoperating unit 214, the tap generator 112E forms the same predictiontaps as those in the tap generator 102 of FIG. 22, and second classtaps, and routes the second class taps to the classification units 113A,113E, while routing the prediction taps to the normal equation additioncircuit 114E.

[0348] The classification units 113A, 113E are fed with the third andsecond class taps, from the tap generators 112A, 112E, respectively,while being fed with the first class taps from the tap generator 117.Similarly to the classification unit 104 of FIG. 22, the classificationunits 113A, 113E integrate the first to third class taps, suppliedthereto, to form ultimate class taps. Based on these ultimate classtaps, the classification units perform the classification to send theclass code to the normal equation addition circuits 114A, 114E.

[0349] The normal equation addition circuit 114A receives the linearprediction coefficients of the frame of interest from the LPC analysisunit 204, as teacher data, while receiving the prediction taps from thetap generator 112A, as pupil data. The normal equation addition circuitperforms the summation, as the normal equation addition circuit 166A ofFIG. 17, for the teacher data and the pupil data, from one class codefrom the classification unit 113A to another, to set the normal equation(13) pertinent to the linear prediction coefficients, from one class toanother. The normal equation addition circuit 114E receives the residualsignals of the frame of interest from the prediction unit 111E, asteacher data, while receiving the prediction taps from the tap generator112E, as pupil data. The normal equation addition circuit performs thesummation, as the normal equation addition circuit 166E of FIG. 17, forthe teacher data and the pupil data, from one class code from theclassification unit 113E to another, to set the normal equation (13)pertinent to the residual signals, from one class to another. A tapcoefficient decision circuit 115A and a tap coefficient decision circuit115E solve the normal equation, generated in the normal equationaddition circuits 114A, 114E, from class to class, to find tapcoefficients pertinent to the linear prediction coefficients and theresidual signals for the respective classes. The tap coefficients, thusfound, are sent to the addresses of the coefficient memories 116A, 116Eassociated with the respective classes.

[0350] Depending on the speech signals, provided as speech signals forlearning, there are occasions wherein, in a class or classes, a numberof the normal equations required to find the tap coefficients cannot beproduced in the normal equation addition circuits 114A, 114E. For suchclass(es), the tap coefficient decision circuits 115A, 115E outputse.g., default tap coefficients.

[0351] The coefficient memories 116A, 116E memorize the class-based tapcoefficients pertinent to linear prediction coefficients and residualsignals, supplied from the tap coefficient decision circuits 115A, 115E,respectively.

[0352] From the L, G, I and the A codes, supplied from the code decisionunit 215, the tap generator 117 generates the same first class taps asthose in the tap generator 101 of FIG. 22, to send the so generatedclass taps to the classification units 113A, 113E.

[0353] The above-described learning device basically performs the sameprocessing as the processing conforming to the flowchart of FIG. 19 tofind the tap coefficients necessary to produce the synthesized sound ofhigh sound quality.

[0354] The learning device is fed with the speech signals for learningand generates teacher data and pupil data at step S111 from the speechsignals for learning.

[0355] That is, the speech signals for learning are input to themicrophone 201. The components from the microphone 201 to the codedecision unit 215 perform the processing similar to that performed bythe microphone 1 to the code decision unit 15 of FIG. 1.

[0356] The linear prediction coefficients, acquired by the LPC analysisunit 204, are sent as teacher data to the normal equation additioncircuit 114A. These linear prediction coefficients are also sent to theprediction filter 111E. The residual signals, obtained in the operatingunit 214, are sent as pupil data to the ta generator 112E.

[0357] The digital speech signals, output by the A/D converter 202, aresent to the prediction filter 111E, while the linear predictioncoefficients, output by the vector quantizer 205, are sent as pupil datato the tap generator 112A. The L, G, I and A codes, output by the codedecision unit 215, are sent to the tap generator 117.

[0358] The prediction filter 111E sequentially renders the frames of thespeech signals for learning, supplied from the A/D converter 202, theframe of interest, and executes the processing conforming to theequation (1), using the speech signals of the frame of interest and thelinear prediction coefficients supplied from the LPC analysis unit 204,to find the residual signals of the frame of interest. The residualsignals, obtained by this prediction filter 111E, are sent as teacherdata to the normal equation addition circuit 114E.

[0359] After acquisition of the teacher and pupil data as describedabove, the program moves to step S112 where the tap generator 112Agenerates prediction taps pertinent to linear prediction coefficientssupplied from the vector quantizer 205, and third class taps, from thelinear prediction coefficients, while the tap generator 112E generatesthe prediction taps pertinent to residual signals supplied from theoperating unit 214, and the second class taps, from the residualsignals. Further, at step S112, the first class taps are generated bythe tap generator 117 from the L, G, I and A codes supplied from thecode decision unit 215.

[0360] The prediction taps pertinent to the linear predictioncoefficients are sent to the normal equation addition circuit 114A,while the prediction taps pertinent to the residual signals are sent tothe normal equation addition circuit 114E. The first to third class tapsare sent to the classification circuits 113A, 113E.

[0361] Subsequently, at step S113, the classification units 113A, 113Eperform classification, based on the first to third class taps, to sendthe resulting class code to the normal equation addition circuits 114A,114E.

[0362] The program then moves to step S114, where the normal equationaddition circuit 114A performs the aforementioned summation of thematrix A and the vector v of the equation (13), for the linearprediction coefficients of the frame of interest from the LPC analysisunit 204, as teacher data, and for the prediction taps from the tapgenerator 112A, as pupil data, for each class code from theclassification unit 113A. At step S114, the normal equation additioncircuit 114E performs the aforementioned summation of the matrix A andthe vector v of the equation (13), for the residual signals of the frameof interest as teacher data from the prediction filter 111E and for theprediction taps as pupil data from the tap generator 112E, for eachclass code from the classification unit 113E. The program then moves tostep S115.

[0363] At step S115, it is verified whether or not there is any speechsignal for learning for the frame to be processed as the frame ofinterest. If it is verified at step S115 that there is any speech signalfor learning of the frame to be processed as the frame of interest, theprogram reverts to step S111 where the next frame is set as a new frameof interest. The processing similar to that described above then isrepeated.

[0364] If it is verified at step S115 that there is no speech signal forlearning of the frame to be processed as the frame of interest, that isif the normal equation is obtained in each class in the normal equationaddition circuits 114A, 114E, the program moves to step S116 where thetap coefficient decision circuit 115A solves the normal equationgenerated for each class to find the tap coefficients for the linearprediction coefficients for each class. These tap coefficients are sentto the address associated with each class of the coefficient memory 116Afor storage therein. The tap coefficient decision circuit 115E solvesthe normal equation generated for each class to find the tapcoefficients for the residual signals for each class. These tapcoefficients are sent to the address associated with each class of thecoefficient memory 116E for storage therein. This finishes theprocessing.

[0365] The tap coefficients pertinent to the linear predictioncoefficients for each class, thus stored in the coefficient memory 116A,are stored in the coefficient memory 105 of FIG. 22, while the tapcoefficients pertinent to the class-based residual signals stored in thecoefficient memory 116E are stored in the same coefficient memory.

[0366] Consequently, the tap coefficients stored in the coefficientmemory 105 of FIG. 22 have been found on learning so that the predictionerrors of the prediction values of the true linear predictioncoefficients or residual signals, obtained on carrying out linearpredictive calculations, herein square errors, will be statisticallyminimum, and hence the residual signals and the linear predictioncoefficients, output by the prediction units 106, 107 of FIG. 22, aresubstantially coincident with the true residual signals and with thetrue linear prediction coefficients, respectively, with the result thatthe synthesized sound generated by these residual signals and the linearprediction coefficients are free of distortion and of high soundquality.

[0367] The above-described sequence of operations may be carried out byhardware or by software. If the sequence of operations is carried out bysoftware, the program forming the software is installed on e.g., ageneral-purpose computer.

[0368] The computer on which is installed the program for executing theabove-described sequence of operations is configured as shown in FIG. 13as described above and the operation similar to that performed by thecomputer shown in FIG. 13 is executed, and hence is not explainedspecifically for simplicity.

[0369] Referring to the drawings, a further modification of the presentinvention is hereinafter explained.

[0370] The speech synthesis device is fed with code data multiplexedfrom the residual code and the A code encoded e.g., on vectorquantization from the residual signals and the linear predictioncoefficients applied to a speech synthesis filter 244. From the residualcode and the A code, the residual signals and the linear predictioncoefficients are decoded and sent to the speech synthesis filter 244 togenerate the synthesized sound. The present speech synthesis device isdesigned to perform predictive processing, using the synthesized soundsynthesized by the speech synthesis filter and the tap coefficients asfound on learning to find and output the speech of high sound quality(synthesized sound) which is the synthesized sound improved in soundquality.

[0371] That is, the speech synthesis device, shown in FIG. 24, exploitsthe classification adaptive processing to decode the synthesized soundinto predicted values of the true speech of high sound quality.

[0372] The classification adaptive processing is comprised of theclassification processing and the adaptive processing. By theclassification processing, data are classified according to propertiesand subjected to adaptive processing from class to class. The adaptiveprocessing is carried out in the manner as described above and hencereference may be made to the previous description to omit the detaileddescription here for simplicity.

[0373] The speech synthesis device, shown in FIG. 24, decodes thedecoded linear prediction coefficients to true linear predictioncoefficients, more precisely predicted values thereof, by theabove-described classification adaptive processing, while decoding thedecoded residual signals to true residual signals, more preciselypredicted values thereof.

[0374] That is, a demultiplexer (DEMUX) 241 is fed with code data andseparates the frame-based A code and residual code from the code datasupplied thereto. The demultiplexer 241 sends the A code to a filtercoefficient decoder 242 and to tap generators 245, 246 to send theresidual code to a residual codebook storage unit 243 and to tapgenerators 245, 246.

[0375] It should be noted that the A code and the residual code,contained in the code data of FIG. 24, are obtained on vectorquantization of the linear prediction coefficients and the residualsignals, both obtained on LPC analyzing the speech, using a presetcodebook.

[0376] The filter coefficient decoder 242 decodes the frame-based Acode, supplied from the demultiplexer 241, into linear predictioncoefficients, based on the same codebook as that used in producing the Acode, to send the so decoded linear prediction coefficients to thespeech synthesis filter 244.

[0377] The residual codebook storage unit 243 decodes the frame-basedresidual code, supplied from the demultiplexer 241, based on the samecodebook as that used in obtaining the residual code, to send theresulting residual signals to the speech synthesis filter 244.

[0378] Similarly to the speech synthesis filter 29, shown in FIG. 2, thespeech synthesis filter 244 is an IIR type digital filter, and filtersthe residual signals from the residual codebook storage unit 243, as aninput signal, with the linear prediction coefficients from the filtercoefficient decoder 242 as tap coefficients of the IIR filter, togenerate the synthesized sound, which is sent to the tap generators 245,246.

[0379] The tap generator 245 extracts, from the sample values of thesynthesized sound sent from the speech synthesis filter 244, and fromthe residual code and the code A, supplied from the demultiplexer 241,what are to be prediction taps used in predictive calculations in aprediction unit 249 as later explained. That is, the tap generator 245sets the A code, residual code and the sample values of the synthesizedsound of the frame of interest, for which predicted values of the highsound quality speech, for example, are to be found, as the predictiontaps. The tap generator 245 routes the prediction taps to the predictionunit 249.

[0380] The tap generator 246 extracts what are to be class taps from thesample values of the synthesized sound supplied from the speechsynthesis filter 244, and from the frame- or subframe-based A code andthe residual code supplied from the demultiplexer 241. Similarly to thetap generator 245, the tap generator 246 sets all of the sample valuesof the synthesized sound of the frame of interest, the A code and theresidual code, as the class taps. The tap generator 246 sends the classtaps to a classification unit 247.

[0381] The pattern of configuration of the prediction and class taps isnot to be limited to the above-mentioned pattern. Although the class andprediction taps are the same in the above case, the class taps and theprediction taps may be different in configuration from each other.

[0382] In the tap generator 245 or 246, the class taps and theprediction taps can also be extracted from the linear predictioncoefficients, obtained from the A code, output from the filtercoefficient decoder 242, or from the residual signals obtained from theresidual codes, output from the residual codebook storage unit 243, asindicated by dotted lines in FIG. 24.

[0383] Based on the class taps from the tap generator 246, theclassification unit 247 classifies the speech sample values of the frameof interest, and outputs the class code, corresponding to the resultingclass, to a coefficient memory 248.

[0384] It is also possible for the classification unit 247 to output thebit strings per se, forming the sample values of the synthesized soundof the frame of interest, as class taps, the A code and the residualcode.

[0385] The coefficient memory 248 holds class-based tap coefficients,obtained on learning in the learning device of FIG. 27, as laterexplained, and outputs to the prediction unit 249 the tap coefficientsstored in the address corresponding to the class code output by theclassification unit 247.

[0386] If N samples of the speech of the high sound quality may be foundfor each frame, N sets of tap coefficients are needed to obtain Nsamples of the speech by the predictive calculations of the equation (6)for the frame of interest. Thus, in the present case, n sets of the tapcoefficients are stored in the address of the coefficient memory 248associated with one class code.

[0387] The prediction unit 249 acquires the prediction taps output bythe tap generator 245 and the tap coefficients output by the coefficientmemory 248 and performs linear predictive calculations as indicated bythe equation (6) to find predicted values of the speech of the highsound quality of the frame of interest to output the resulting predictedvalues to a D/A converter 250.

[0388] The coefficient memory 248 outputs N sets of tap coefficients forfinding each of N samples of the speech of the frame of interest, asdescribed above. The prediction unit 249 executes the sum-of-productsprocessing of the equation (6), using the prediction taps for respectivesample values and a set of tap coefficients associated with therespective sample values.

[0389] The D/A converter 250 D/A converts the prediction values of thespeech from the prediction unit 249 from digital signals into analogsignals, which are sent to and output at the loudspeaker 51.

[0390]FIG. 25 shows a specified structure of the speech synthesis filter244 shown in FIG. 24. The speech synthesis filter 244, shown in FIG. 25,uses p-dimensional linear prediction coefficients, and hence is formedby an adder 261, p delay circuits (D) 262 ₁ to 262 _(p) and pmultipliers 263 ₁ to 263 _(p).

[0391] In the multipliers 263 ₁ to 263 _(p) are set p-dimensional linearprediction coefficients α₁, α₂, . . . , α_(p), supplied from the filtercoefficient decoder 242, so that the speech synthesis filter 244performs the calculations conforming to the equation (4) to generate thesynthesized sound.

[0392] That is, the residual signals e, output by the residual codebookstorage unit 243, are sent through an adder 261 to a delay circuit 262₁. The delay circuit 262 _(p) delays the input signal thereto by onesample of the residual signals to output the resulting delayed signal toa downstream side delay circuit 262 _(p+1) and to an operating unit 263_(p). The multiplier 263 _(p) multiplies an output of the delay circuit262 _(p) with the linear prediction coefficient α_(p) set thereat tooutput the product value to the adder 261.

[0393] The adder 261 sums all outputs of the multipliers 263 ₁ to 263_(p) and the residual signals e to send the resulting sum to a delaycircuit 262 ₁ as well as to output the result of speech synthesis(synthesized sound).

[0394] Referring to the flowchart of FIG. 26, the speech synthesisprocessing of the speech synthesis device of FIG. 24 is explained.

[0395] The demultiplexer 241 sequentially separates the A code and theresidual code, from the code data supplied thereto, on the frame basis,to send the respective codes to the filter coefficient decoder 242 andto the residual codebook storage unit 243. The demultiplexer 241 alsosends the A code and the residual code to the tap generators 245, 246.

[0396] The filter coefficient decoder 242 sequentially decodes theframe-based A code, supplied from the demultiplexer 241, into linearprediction coefficients, which are then sent to the speech synthesisfilter 244. The residual codebook storage unit 243 sequentially decodesthe frame-based residual code, supplied from the demultiplexer 241, intoresidual signals, which are then sent to the speech synthesis filter244.

[0397] The speech synthesis filter 244 then performs the calculations ofthe equation (4), using the residual signals and the linear predictioncoefficients, supplied thereto, to generate the synthesized sound of theframe of interest. This synthesized sound is sent to the tap generators245, 246.

[0398] The tap generator 245 sequentially renders the frame of thesynthesized sound, supplied thereto, the frame of interest. At stepS201, the tap generator 245 generates prediction taps, from the samplevalues of the synthesized sound supplied from the speech synthesisfilter 244 and from the A code and the residual code, supplied from thedemultiplexer 241, to output the so generated prediction taps to theprediction unit 249. At step S201, the tap generator 246 generates classtaps, from the synthesized sound sent from the speech synthesis filter244 and from the A code and the residual code, supplied from thedemultiplexer 241, to route the so generated class taps to theclassification unit 247.

[0399] At step S202, the classification unit 247 executes theclassification, based on the class taps supplied from the tap generator246, to send the resulting class code to the coefficient memory 248. Theprogram then moves to step S203.

[0400] At step S203, the coefficient memory 248 reads out the tapcoefficients from the address associated with the class code sent fromthe classification unit 247 to send the so read out ta coefficients tothe prediction unit 249.

[0401] At step S204, the prediction unit 249 acquires the tapcoefficients output by the coefficient memory 248 and, using the tapcoefficients and the prediction taps from the tap generator 245,executes the sum-of-products processing of the equation (6) to acquirepredicted values of the speech of high sound quality of the frame ofinterest. The speech of the high sound quality is sent to and output atthe loudspeaker 251 from the prediction unit 249 through the D/Aconverter 250.

[0402] After the speech of the high sound quality is obtained at theprediction unit 249, the program moves to step S205 where it is verifiedwhether or not there is any frame to be processed as the frame ofinterest. If it is verified at step S205 that there is any frame to beprocessed as the frame of interest, the program reverts to step S201where a frame which is to become the next frame of interest is set as anew frame of interest. The similar processing is then repeated. If it isverified at step S205 that there is no frame to be processed, the speechsynthesis processing is terminated.

[0403]FIG. 27 is a block diagram showing an instance of a learningdevice adapted for performing the learning of the tap coefficients to bestored in the coefficient memory 248 shown in FIG. 24.

[0404] The learning device shown in FIG. 27 is fed with digital speechsignals for learning of high sound quality, in terms of a preset frameas a unit. The digital speech signals for learning are sent to an LPCanalysis unit 271 and to a prediction filter 274. The digital speechsignals for learning are also sent as teacher data to a normal equationaddition circuit 281.

[0405] The LPC analysis unit 271 sequentially renders the frames of thespeech signals, sent thereto, the frame of interest, and LPC-analyzesthe speech signals of the frame of interest to find p-dimensional linearprediction coefficients, which then are sent to a vector quantizer 272and to the prediction unit 274.

[0406] The vector quantizer 272 holds a codebook which associates codevectors having the linear prediction coefficients as the code vectorswith the codes and, based on this codebook, vector-quantizes the featurevector formed by linear prediction coefficients of the frame of interestfrom the LPC analysis unit 271 to send the A code resulting from thevector quantization to the filter coefficient decoder 273 and to tapgenerators 278, 279.

[0407] The filter coefficient decoder 273 holds the same codebook asthat stored in a vector quantizer 272 and, based on this codebook,decodes the A code from the vector quantizer 272 into linear predictioncoefficients, which are sent to a speech synthesis filter 277. It shouldbe noted that the filter coefficient decoder 242 of FIG. 24 is of thesame structure as the filter coefficient decoder 273 of FIG. 27.

[0408] The prediction filter 274 performs the calculations conforming tothe equation (1), using the speech signals of the frame of interest,supplied thereto, and the linear prediction coefficients from the LPCanalysis unit 271, to find the residual signals of the frame ofinterest, which are routed to a vector quantizer 275.

[0409] That is, if the Z-transforms of s_(n) and e_(n) in the equation(1) are represented by S and E, respectively the equation (1) may berepresented by:

E=(1+α₁ z ⁻¹+α₂ z ⁻²+ . . . +α_(p) z ^(−p))S.  (16)

[0410] From the equation (14), the prediction filter 274 for finding theresidual signals e may be designed as an FIR (Finite Impulse Response)digital filter.

[0411]FIG. 28 shows an illustrative structure of the prediction filter274.

[0412] The prediction filter 274 is fed with p-dimensional linearprediction coefficients from the LPC analysis unit 271. So, theprediction filter 274 is made up of p delay circuits (D) 291 ₁ to 291_(p), p multipliers 292 ₁ to 292 _(p) and a sole adder 293.

[0413] In the multipliers 292 ₁ to 292 _(p), there are set p-dimensionallinear prediction coefficients α₁, α₂, . . . , α_(p) supplied from theLPC analysis unit 271.

[0414] On the other hand, the speech signals s of the frame of interestare sent to a delay circuit 291 ₁ and to an adder 293. The delay circuit291 _(p) delays the input signal thereat by one sample of the residualsignals to output the delayed signal to a downstream side delay circuit291 _(p+1) and to an operating unit 292 _(p). The multiplier 292 _(p)multiplies the output of the delay circuit 291 _(p) with the linearprediction coefficient α_(p) set thereat to send the result of additionas the residual signals e to the adder 293.

[0415] The adder 293 sums all outputs of the multipliers 292 ₁ to 292_(p) and the speech signals s to send the results of addition as theresidual signals e.

[0416] Referring to FIG. 27, the vector quantizer 275 holds a codebookwhich associates code vectors with sample values of the residual signalsas components and, based on this codebook, vector-quantizes the residualvector, constituted by sample values of the residual signals e of theframe of interest from the prediction filter 274 to send the residualcode resulting from the vector quantization to the residual codebookstorage unit 276 and to the tap generators 278, 279.

[0417] The residual codebook storage unit 276 holds the same codebook asthat stored in the vector quantizer 275 and, based on this codebook,decodes the residual code from the vector quantizer 275 into residualsignals which are sent to the speech synthesis filter 277. It should benoted that the stored contents of the residual codebook storage unit 243of FIG. 24 are the same as the stored contents of the residual codebookstorage unit 276 of FIG. 27.

[0418] The speech synthesis filter 277 is an IIR type digital filter,constructed similarly to the speech synthesis filter 244 of FIG. 24 andfilters the residual signals from the filter residual codebook storageunit 276, as an input signal, with the linear prediction coefficientsfrom the filter coefficient decoder 273 as tap coefficients of the IIRfilter, to generate the synthesized sound, which is sent to the tapgenerators 278, 279.

[0419] Similarly to the tap generator 245 of FIG. 24, the tap generator278 forms prediction taps from the synthesized sound from the speechsynthesis filter 277, the A code supplied from the vector quantizer 272and from the residual code supplied from the vector quantizer 275 tosend the so formed prediction taps to the normal equation additioncircuit 281. Also, the tap generator 279, similarly to the tap generator246 in FIG. 24, forms class taps from the synthesized sound from thespeech synthesis filter 277, the A code supplied from the vectorquantizer 272 and from the residual code supplied from the vectorquantizer 275 to send the so formed class taps to the normal equationaddition circuit 280.

[0420] Similarly to the classification unit 247 of FIG. 24, theclassification unit 280 performs classification based on the class taps,supplied thereto, to send the resulting class code to the normalequation addition circuit 281.

[0421] The normal equation addition circuit 281 executes summation ofthe speech for learning, which is the speech of high sound quality ofthe frame of interest, as teacher data, and prediction taps from the tapgenerator 78, as pupil data.

[0422] That is, the normal equation addition circuit 281 performscalculations corresponding to reciprocal multiplication (x_(in)x_(im))and summation (Σ) of pupil data, as respective components in theaforementioned matrix A of the equation (13), using the prediction taps(pupil data), from one class corresponding to the class code suppliedfrom the classification unit 280 to another.

[0423] Moreover, the normal equation addition circuit 281 performscalculations corresponding to reciprocal multiplication (x_(in)y_(i))and summation (Σ) of pupil data and teacher data, as respectivecomponents in the vector v of the equation (13), using the pupil dataand the teacher data, from one class corresponding to the class codesupplied from the classification unit 280 to another.

[0424] The aforementioned summation by the normal equation additioncircuit 281 is carried out with the totality of the speech frames forlearning, supplied thereto, to set a normal equation (13) for eachclass.

[0425] A tap coefficient decision circuit 281 solves the normalequation, generated in the normal equation addition circuit 281, fromclass to class, to find tap coefficients pertinent to the linearprediction coefficients and the residual signals for the respectiveclasses. The tap coefficients, thus found, are sent to the addresses ofthe coefficient memory 283 associated with the respective classes.

[0426] Depending on the speech signals, provided as speech signals forlearning, there are occasions wherein, in a certain class or classes, anumber of the normal equations required to find the tap coefficientscannot be produced in the normal equation addition circuit 281. For suchclass(es), the tap coefficient decision circuit outputs e.g., defaulttap coefficients.

[0427] The coefficient memory 283 memorizes the class-based tapcoefficients supplied from the tap coefficient decision circuit 281 inan address associated with the class.

[0428] Referring to the flowchart of FIG. 29, the learning processing ofthe learning device of FIG. 27 is explained.

[0429] The learning device is fed with speech signals for learning. Thespeech signals for learning are sent to the LPC analysis unit 271 and tothe prediction filter 274, while being sent as teacher data to thenormal equation addition circuit 281. At step S211, pupil data aregenerated from the speech signals for learning, as teacher data.

[0430] Specifically, the LPC analysis unit 271 sequentially sets theframes of the speech signals for learning as the frame of interest andLPC-analyzes the speech signals of the frame of interest to findp-dimensional linear prediction coefficients which are sent to thevector quantizer 272. The vector quantizer 272 vector-quantizes thefeature vector formed by linear prediction coefficients of the frame ofinterest from the LPC analysis unit 271 to send the A code obtained onsuch vector quantization as pupil data to the filter coefficient decoder273 and to the tap generators 278, 279. The filter coefficient decoder273 decodes the A code from the vector quantizer 272 into linearprediction coefficients, which then are routed to the speech synthesisfilter 277.

[0431] On receipt of the linear prediction coefficients of the frame ofinterest from the LPC analysis unit 271, the prediction filter 274executes the calculations of the equation (1), using the linearprediction coefficients and the speech signals for learning of the frameof interest, to find the residual signals of the frame of interest,which are then routed to the vector quantizer 275. The vector quantizer275 vector-quantizes the residual vector, formed by sample values of theresidual signals of the frame of interest from the prediction filter274, and routes the residual code obtained on vector quantization aspupil data to the residual codebook storage unit 276 and to the tapgenerators 278, 279. The residual codebook storage unit 276 decodes theresidual code from the vector quantizer 275 into residual signals whichare supplied to the speech synthesis filter 277.

[0432] Thus, on receipt of the linear prediction coefficients and theresidual signals, the speech synthesis filter 277 synthesizes thespeech, using the linear prediction coefficients and the residualsignals, and sends the resulting synthesized sound as pupil data to thetap generators 278, 279.

[0433] The program then moves to step S212 where the tap generator 278generates prediction taps and class taps from the synthesized soundsupplied from the speech synthesis filter 277, A code supplied from thevector quantizer 272 and from the residual code supplied from the vectorquantizer 275. The prediction taps and the class taps are sent to thenormal equation addition circuit 281 and to the classification unit 280,respectively.

[0434] Subsequently, at step S213, the classification unit 280 performsclassification, based on the class taps from the tap generator 279, tosend the resulting class code to the normal equation addition circuit281.

[0435] The program then moves to step S214, where the normal equationaddition circuit 281 performs the aforementioned summation of the matrixA and the vector v of the equation (13), for the sample values of thespeech of high sound quality of the frame of interest, supplied thereto,as teacher data, and for the prediction taps from the tap generator 278,as pupil data, for each class code from the classification unit 280. Theprogram then moves to step S215.

[0436] At step S215, it is verified whether or not there is any speechsignal for learning for the frame processed as the frame of interest. Ifit is verified at step S215 that there is any speech signal for learningof the frame processed as the frame of interest, the program reverts tostep S211 where the next frame is set as a new frame of interest. Theprocessing similar to that described above then is repeated.

[0437] If it is verified at step S215 that there is no speech signal forlearning of the frame to be processed as the frame of interest, that isif the normal equation is obtained in each class in the normal equationaddition circuit 281, the program moves to step S216 where the tapcoefficient decision circuit 281 solves the normal equation generatedfor each class to find the tap coefficients for each class. These tapcoefficients are sent to the address associated with each class of thecoefficient memory 283 for storage therein. This finishes theprocessing.

[0438] The class-based tap coefficients, thus stored in the coefficientmemory 283, are stored in the coefficient memory 248 of FIG. 24.

[0439] Consequently, the tap coefficients stored in the coefficientmemory 248 of FIG. 3 have been found on learning so that the predictionerrors of the prediction values of the true speech of high soundquality, obtained on carrying out linear predictive calculations, hereinsquare errors, will be statistically minimum, so that the residualsignals and the linear prediction coefficients, output by the predictionunit 249 of FIG. 24, are free of distortion proper to the synthesizedsound produced in the speech synthesis filter 244 and hence of highsound quality.

[0440] If, in the tap generator 246 in the speech synthesis device,shown in FIG. 24, the class taps are to be extracted from the linearprediction coefficients and the residual signals, it is necessary forthe tap generator 278 of FIG. 27 to extract similar class taps from thelinear prediction coefficients generated by the filter coefficientdecoder 273 or from the residual signals output by the residual codebookstorage unit 276, as shown with dotted lines. The same holds for theprediction taps generated by the tap generator 245 of FIG. 24 or by thetap generator 278 of FIG. 27.

[0441] For simplifying the explanation in the above case, theclassification is carried out as the bit string forming the class tap isdirectly used as the class code. In this case, however, the number ofthe classes may be of an exorbitant value. Thus, in the classification,the class taps may be compressed by e.g., vector quantization to use thebit string resulting from the compression as the class code.

[0442] An instance of the transmission system embodying the presentinvention is now explained with reference to FIG. 30. The system hereinmeans a set of logically arrayed plural devices, while it does notmatter whether or not the respective devices are in the same casing.

[0443] In this transmission system, the portable telephone sets 401 ₁,401 ₂ perform radio transmission and receipt with base stations 402 ₁,402 ₂, respectively, while the base stations 402 ₁, 402 ₂ perform speechtransmission and receipt with an exchange station 403 to enable speechtransmission and receipt between the portable telephone sets 401 ₁, 401₂ with the aid of the base stations 402 ₁, 402 ₂ and the exchangestation 403. The base stations 402 ₁, 402 ₂ may be the same as ordifferent from each other.

[0444] The portable telephone sets 401 ₁, 401 ₂ are referred to below asa portable telephone set 401, unless there is no particular necessityfor making distinctions between the two sets.

[0445]FIG. 31 shows an illustrative structure of the portable telephoneset 401 shown in FIG. 30.

[0446] An antenna 411 receives electrical waves from the base stations402 ₁, 402 ₂ to send the received signals to a modem 412 as well as tosend the signals from the modem 412 to the base stations 402 ₁, 402 ₂ aselectrical waves. The modem 412 demodulates the signals from the antenna411 to send the resulting code data explained in FIG. 1 to a receiptunit 414. The modem 412 also is configured for modulating the code datafrom the transmitter 413 as shown in FIG. 1 and sends the resultingmodulated signal to the antenna 411. The transmission unit 413 isconfigured similarly to the transmission unit shown in FIG. 1 and codesthe user's speech input thereto into code data which is sent to themodem 412. The receipt unit 414 receives the code data from the modem412 to decode and output the speech of high sound quality similar tothat obtained in the speech synthesis device of FIG. 24.

[0447] That is, FIG. 32 shows an illustrative structure of the receiptunit 114 of the portable telephone set 401 shown in FIG. 31. In thedrawing, parts or components corresponding to those shown in FIG. 2 aredepicted by the same reference numerals and are not explainedspecifically.

[0448] The frame-based synthesized sound, output by the speech synthesisunit 29, and the frame-based or subframe-based L, G, I and A codes,output by a channel decoder 21 are sent to tap generators 221, 222. Thetap generators 221, 222 extract what are to be the prediction taps andwhat are to be class taps from the synthesized sound, L code, G code, Icode and the A code, supplied thereto. The prediction taps are sent to aprediction unit 225, while the class taps are sent to the classificationunit 223.

[0449] The classification unit 223 performs classification based on theclass taps supplied from the tap generator 122 to route the class codesresulting from the classification to a coefficient memory 224.

[0450] The coefficient memory 224 holds the class-based tapcoefficients, obtained on learning by the learning device of FIG. 33,which will be explained subsequently. The coefficient memory sends thetap coefficients stored in the address associated with the class codeoutput by the classification unit 223 to the prediction unit 225.

[0451] Similarly to the prediction unit 249 of FIG. 24, the predictionunit 225 acquires the prediction taps output by the tap generator 221and the tap coefficients output by the coefficient memory 224 and, usingthe prediction and class taps, performs the linear predictivecalculations shown in equation (6). In this manner, the prediction unit225 finds the predicted values of the speech of high sound quality ofthe frame of interest to route the so found out predicted values to theD/A converter 30.

[0452] The receipt unit 414, constructed as described above, performsthe processing which is basically in meeting with the flowchart of FIG.26 to provide an output synthesized sound of high sound quality as beingthe result of speech decoding.

[0453] That is, the channel decoder 21 separates the L, G, I and Acodes, from the code data, supplied thereto, to send the so separatedcodes to the adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and to the filter coefficientdecoder 25, respectively. The L, G, I and A codes are also sent to thetap generators 221, 222.

[0454] The adaptive codebook storage unit 22, gain decoder 23,excitation codebook storage unit 24 and the operating units 26 to 28perform the processing similar to that performed in the adaptivecodebook storage unit 9, gain decoder 10, excitation codebook storageunit 11 and in the operating units 12 to 14 of FIG. 1 to decode the L, Gand I codes to residual signals e. These residual signals are routed tothe speech synthesis unit 29.

[0455] As explained with reference to FIG. 1, the filter coefficientdecoder 25 decodes the A codes, supplied thereto, into linear predictioncoefficients, which are routed to speech synthesis unit 29. The speechsynthesis unit 29 performs speech synthesis, using the linear predictioncoefficients from the filter coefficient decoder 25, to send theresulting synthesized sound to the tap generators 221, 222.

[0456] The tap generator 221 renders the frames of the synthesized soundoutput from the speech synthesis unit 29 a frame of interest. At stepS201, the tap generator generates prediction taps from the synthesizedsound of the frame of interest, and from the L, G, I and A codes, toroute the so generated prediction taps to the prediction unit 225. Atstep S201, the tap generator 222 generates class taps from thesynthesized sound of the frame of interest and from the L, G, I and Acodes to send the so generated class taps to the classification unit223.

[0457] At step S202, the classification unit 223 executes classificationbased on the class taps supplied from the tap generator 222 to send theresulting class code to the coefficient memory 224. The program thenmoves to step S203.

[0458] At step S203, the coefficient memory 224 reads out tapcoefficients from the address associated with the class code suppliedfrom the classification unit 223 to send the read-out tap coefficientsto the prediction unit 225.

[0459] At step S204, the prediction unit 225 acquires the tapcoefficients output by the coefficient memory 224 and, using the tapcoefficients and the prediction taps from the tap generator 221,executes the sum-of-products processing shown in equation (6) to acquirethe predicted value of the speech of high sound quality of the frame ofinterest.

[0460] The speech of the high sound quality, obtained as describedabove, is sent from the prediction unit 225 through the D/A converter 30to the loudspeaker 31 which then outputs the speech of high soundquality.

[0461] After the processing of step S204, the program moves to step S205where it is verified whether or not there is any frame to be processedas a frame of interest. If it is found that there is such frame, theprogram reverts to step S201 where the frame which is to be the nextframe of interest is set as the new frame of interest and subsequentlythe similar sequence of operations is repeated. If it is found at stepS205 that there is no frame to be processed as the frame of interest,the processing is terminated.

[0462] Referring to FIG. 33, an instance of a learning device forlearning the tap coefficients to be stored in the coefficient memory 224of FIG. 32 is explained.

[0463] The components from a microphone 501 to a code decision unit 515are configured similarly to the microphone 1 to the code decision unit15 of FIG. 1. The microphone 501 is fed with speech signals for learningso that the components microphone 501 to the code decision unit 515process the speech signals for learning as in the case of FIG. 1.

[0464] The synthesized sound output by a speech synthesis filter 506when the square error is verified to be the smallest in a minimum squareerror decision unit 508 i sent to tap generators 431, 432. The tapgenerators 431, 432 are also fed with the L, G, I and A codes outputwhen the code decision unit 515 has received the definite signal fromthe minimum square error decision unit 508. The speech output by an A/Dconverter 202 is fed as teacher data to a normal equation additioncircuit 434.

[0465] A tap generator 431 forms the same prediction tap as that of thetap generator 221 of FIG. 32, based on the synthesized sound output bythe speech synthesis filter 506 and the L, G, I and A codes output bythe code decision unit 515, to send the so formed prediction taps aspupil data to the normal equation addition circuit 234.

[0466] A tap generator 232 also forms the same class taps as those ofthe tap generator 222 of FIG. 32, from the synthesized sound output by aspeech synthesis filter 506 and the L, G, I and A codes output by thecode decision unit 515, and routes the so formed class taps to aclassification unit 433.

[0467] Based on the class taps from the tap generator 432, theclassification unit 433 performs classification in the same way as theclassification unit 223 of FIG. 32 to send the resulting class code tothe normal equation addition circuit 434.

[0468] The normal equation addition circuit 434 receives the speech froman A/D converter 502 as teacher data and prediction taps from the tapgenerator 131. The normal equation addition circuit then performssummation as in the normal equation addition circuit 281 of FIG. 27 toset a normal equation shown n the equation (13) for each class from theclassification unit 433.

[0469] A tap coefficient decision circuit 435 solves the normalequation, generated on the class basis, by the normal equation additioncircuit 434, to find tap coefficients from class to class, to send theso found tap coefficients to the address associated with each class ofthe coefficient memory 436.

[0470] Depending on the speech signals, provided as speech signals forlearning, there are occasions wherein, in a certain class or classes, anumber of the normal equations required to find the tap coefficientscannot be produced in the normal equation addition circuit 434. For suchclass(es), the tap coefficient decision circuit 435 outputs e.g.,default tap coefficients.

[0471] The coefficient memory 436 memorizes the class-based tapcoefficients, pertinent to linear prediction coefficients and residualsignals, supplied from the tap coefficient decision circuit 435.

[0472] In the above-described learning device, the processing similar tothe processing conforming to the flowchart shown in FIG. 29 is performedto find tap coefficients for obtaining the synthesized sound of highsound quality.

[0473] That is, the learning device is fed with speech signals forlearning and, at step S211, teacher data and pupil data are generatedfrom these speech signals for learning.

[0474] That is, the speech signals for learning are input to themicrophone 501. The components from the microphone 501 to the codedecision unit 515 perform the processing similar to that performed bythe microphone 1 to the code decision unit 15 of FIG. 1.

[0475] The result is that the speech of digital signals, obtained in theA/D converter 502, is sent as teacher data to the normal equationaddition circuit 434. The synthesized sound, output by the speechsynthesis filter 506 when the minimum square error decision unit 508 hasverified that the square error has become smallest, is sent as pupildata to the tap generators 431, 432. The L, G, I and A codes, output bythe code decision unit 515 when the minimum square error decision unit508 has verified that the square error has become smallest, are alsosent as pupil data to the tap generators 431, 432.

[0476] The program then moves to step S212 where the tap generator 431generates prediction taps, with the frame of the synthesized sound sentas pupil data from the speech synthesis filter 506 as the frame ofinterest, from the L, G, I and A codes and the synthesized sound of theframe of interest, to route the so produced prediction taps to thenormal equation addition circuit 434. At step S212, the tap generator432 also generates class taps from the L, G, I and A codes and thesynthesized sound of the frame of interest, to send the so generatedclass taps to the classification unit 433.

[0477] After processing at step S212, the program moves to step S213,where the classification unit 433 performs classification based on theclass taps from the tap generator 432 to send the resulting class codesto the normal equation addition circuit 434.

[0478] The program then moves to step S214, where the normal equationaddition circuit 434 performs the aforementioned summation of the matrixA and the vector v of the equation (13), for the speech of high soundquality of the frame of interest from the A/D converter 502, as teacherdata, and for the prediction taps from the tap generator 432, as pupildata, for each class code from the classification unit 433. The programthen moves to step S215.

[0479] At step S215, it is verified whether or not there is any speechsignal for learning for the frame to be processed as the frame ofinterest. If it is verified at step S215 that there is any speech signalfor learning of the frame to be processed as the frame of interest, theprogram reverts to step S211 where the next frame is set as a new frameof interest. The processing similar to that described above then isrepeated.

[0480] If it is verified at step S215 that there is no speech signal forlearning of the frame to be processed as the frame of interest, that isif the normal equation is obtained in each class in the normal equationaddition circuit 434, the program moves to step S216 where the tapcoefficient decision circuit 435 solves the normal equation generatedfor each class to find the tap coefficients for each class. These tapcoefficients are sent to and stored in the address in the coefficientmemory 436 associated with each class to terminate the processing.

[0481] The class-based tap coefficients, are stored in the coefficientmemory 436, are stored in the coefficient memory 224 of FIG. 32.

[0482] Consequently, the tap coefficients stored in the coefficientmemory 224 of FIG. 32 have been found on learning so that the predictionerrors of the prediction values of the true speech of high soundquality, obtained on carrying out linear predictive calculations, hereinsquare errors, will be statistically minimum, so that the speech outputby the prediction unit 225 of FIG. 32 is of high sound quality.

[0483] In the instances shown in FIGS. 32 and 33, the class taps aregenerated from the synthesized sound output by the speech synthesisfilter 506 and the L, G, I and A codes. Alternatively, the class tapsmay also be generated from one or more of and the L, G, I and A codesand from the synthesized sound output by the speech synthesis filter506. The class taps may also be formed from linear predictioncoefficients α_(p) obtained from the A code, the information obtainedfrom the L, G, I or A code, inclusive of the gain values β, γ obtainedfrom the G code, such as residual signals e, or l,n for producing theresidual signals e or with 1/β or n/γ, as shown with dotted lines inFIG. 32. The class taps may also be produced from the synthesized soundoutput by the speech synthesis filter 506 or the above-mentionedinformation derive from the L, G, I or A code. In cases where softwareinterpolation bits or the frame energy are contained in the code data inthe CELP system, the class taps may be formed using the softinterpolation bits or the frame energy. The same may be said of theprediction taps.

[0484]FIG. 34 shows speech signals s, used as teacher data, data ss ofthe synthesized sound used as pupil data, residual signals e and n, lused for finding the residual signals e in the learning device of FIG.33.

[0485] The above-described sequence of operations may be carried out bysoftware or by hardware. If the sequence of operations is carried out bysoftware, the program forming the software is installed on e.g., ageneral-purpose computer.

[0486] The above-described sequence of operations may be carried out bysoftware or by hardware. If the sequence of operations is carried out bysoftware, the program forming the software is installed on e.g., ageneral-purpose computer.

[0487] The computer on which is installed the program for executing theabove-described sequence of operations is configured as shown in FIG.13, as described above, and the operation similar to that performed bythe computer shown in FIG. 13 is executed, and hence is not explainedspecifically for simplicity.

[0488] In the present invention, the processing step for stating theprogram for executing the various processing operations by a computerneed not be carried out chronologically in the order stated in theflowchart, but may be processed in parallel or batch-wise, such asparallel processing or object-based processing.

[0489] The program may be processed by a sole computer or by pluralcomputers in a distributed fashion. Moreover, the program may betransmitted to a remotely located computer for execution.

[0490] Although no particular reference has been made in the presentinvention as to which sort of the speech signals for learning is to beused, the speech signals for learning may not only be the speech utteredby a speaker but may also be a musical number (music). If, in theabove-described learning, the speech uttered by a speaker is used as thespeech signals for learning, such tap coefficients which will improvethe sound quality of the speech may be obtained, whereas, if the speechsignals for learning are music numbers are used, such tap coefficientsmay be obtained which will improve the sound quality of the musicalnumber.

[0491] The present invention may be broadly applied in generating thesynthesized sound from the code obtained on encoding by the CELP system,such as VSELP (Vector Sum Excited Linear Prediction), PSI-CELP (PitchSynchronous Innovation CELP), CS-ACELP (Conjugate Structure AlgebraicCELP).

[0492] The present invention also is broadly applicable not only to sucha case where the synthesized sound is generated from the code obtainedon encoding by CELP system but also to such a case where residualsignals and linear prediction coefficients are obtained from a givencode to generate the synthesized sound.

[0493] In the above-described embodiment, the prediction values ofresidual signals and linear prediction coefficients are found byone-dimensional linear predictive calculations. Alternatively, theseprediction values may be found by two-or higher dimensional predictivecalculations.

[0494] In the above explanation, the classification is carried out byvector quantizing the class taps. Alternatively, the classification mayalso be carried out by exploiting e.g., the ADRC processing.

[0495] In the classification employing the ADRC, the elements making upthe class tap, that is sampled values of the synthesized sound, or L, G,I and A codes, are processed with ADRC, and the class is determined inaccordance with the resulting ADRC code.

[0496] In the K-bit ADRC, the maximum value MAX and the minimum valueMIN of the elements, forming the class tap, are detected, DR=MAX−MIN isset as the local dynamic range of the set, and the elements forming theclass taps are re-quantized into K bits. That is, the minimum value MINis subtracted from the respective elements forming the class tap, andthe resulting difference value is divided by DR/2K. The values of the Kbits of the respective elements, forming the class tap, obtained asdescribed above, are arrayed in a preset sequence into a bit string,which is output as an ADRC code.

INDUSTRIAL APPLICABILITY

[0497] According to the present invention, described above, theprediction taps used for predicting the speech of high sound quality, astarget speech, the prediction values of which are to be found, areextracted from the synthesized sound or from the code or the informationderived from the code, whilst the class taps used for sorting the targetspeech to one of plural classes are extracted from the synthesizedsound, code or the information derived from the code. The class of thetarget speech is found based on the class taps. Using the predictiontaps and the tap coefficients corresponding to the class of the targetspeech, the prediction values of the target speech are found to generatethe synthesized sound of high sound quality.

1. A data processing device for carrying out speech processing in whichprediction taps for finding prediction values of the speech of highsound quality are extracted from the synthesized sound obtained onaffording linear prediction coefficients and residual signals, generatedfrom a preset code, to a speech synthesis filter, said speech of highsound quality being higher in sound quality than said synthesized sound,and in which said prediction taps are used along with preset tapcoefficients to perform preset predictive calculations to find saidprediction values of said speech of high sound quality, said devicecomprising: prediction tap extracting means for extracting from saidsynthesized sound said prediction taps used for predicting said speechof high sound quality, as target speech, the prediction values of whichare to be found; class tap extraction means for extracting a class tap,used for sorting said target speech to one of a plurality of classes,from said code, by way of classification; classification means forfinding the class of said target speech based on said class tap;acquisition means for acquiring said preset tap coefficients associatedwith the class of said target speech from among a plurality of tapcoefficients as found on learning from class to class; and predictionmeans for finding said prediction values of said target speech usingsaid prediction taps and said preset tap coefficients associated withsaid class of said target speech.
 2. The data processing deviceaccording to claim 1 wherein said prediction means performone-dimensional linear predictive calculations, using said predictiontaps and the tap coefficients, to find the prediction values of saidtarget speech.
 3. The data processing device according to claim 1wherein said acquisition means acquires said tap coefficients of theclass associated with said target speech from storage means holding saidtap coefficients on the class basis.
 4. The data processing deviceaccording to claim 1 wherein said class tap extraction means extractssaid class taps from said code and from said linear predictioncoefficients or residual signals obtained on decoding said code.
 5. Thedata processing device according to claim 1 wherein said tapcoefficients have been obtained on carrying out learning so that theprediction errors of the predicted values of the speech of high soundquality obtained on carrying out preset predictive calculationsemploying said prediction taps and said tap coefficients will bestatistically minimum.
 6. The data processing device according to claim1 further comprising: said speech synthesis filter.
 7. The dataprocessing device according to claim 1 wherein said code has beenobtained on encoding the speech in accordance with the CELP (CodeExcited Linear Prediction Coding) system.
 8. A data processing methodfor carrying out speech processing of extracting prediction taps forfinding prediction values of the speech of high sound quality from thesynthesized sound obtained on affording linear prediction coefficientsand residual signals, generated from a preset code, to a speechsynthesis filter, said speech of high sound quality being higher insound quality than said synthesized sound, and of performing presetpredictive calculations using prediction taps along with preset tapcoefficients to find said prediction values of said speech of high soundquality, said method comprising: a prediction tap extracting step ofextracting from said synthesized sound said prediction taps used forpredicting said speech of high sound quality, as target speech, theprediction values of which are to be found; a class tap extraction stepof extracting a class tap, used for sorting said target speech to one ofa plurality of classes, by way of classification, from said code; aclassification step of finding the class of said target speech based onsaid class tap; an acquisition step of acquiring said tap coefficientsassociated with the class of said target speech from among said tapcoefficients as found on learning from class to class; and a predictionstep of finding said prediction values of said target speech using saidprediction taps and said tap coefficients associated with said class ofsaid target speech.
 9. A recording medium having recorded thereon aprogram for having a computer execute speech processing of extractingprediction taps for finding prediction values of the speech of highsound quality from the synthesized sound obtained on affording linearprediction coefficients and residual signals, generated from a presetcode, to a speech synthesis filter, said speech of high sound qualitybeing higher in sound quality than said synthesized sound, and ofperforming preset predictive calculations using said prediction tapsalong with preset tap coefficients to find said prediction values ofsaid speech of high sound quality, said method comprising: a predictiontap extracting step of extracting from said synthesized sound saidprediction taps used for predicting said speech of high sound quality,as target speech, the prediction values of which are to be found; aclass tap extraction step of extracting class taps, used for sortingsaid target speech to one of a plurality of classes, by way ofclassification, from said code; a classification step of finding theclass of said target speech based on said class taps; an acquisitionstep of acquiring said tap coefficients associated with the class ofsaid target speech from among said tap coefficients as found on learningfrom class to class; and a prediction step of finding said predictionvalues of said target speech using said prediction taps and said tapcoefficients associated with said class of said target speech.
 10. Alearning device for learning preset class taps usable for finding, bypreset predictive calculations, prediction values of the speech of highsound quality from the synthesized sound obtained on affording linearprediction coefficients and residual signals, generated from a presetcode, to a speech synthesis filter, said speech of high sound qualitybeing higher in sound quality than said synthesized sound, said learningdevice comprising: class tap extraction means for extracting class tapsfrom said code, said class taps being used for classifying said speechof high sound quality, as target speech, the prediction values of whichare to be found; classification means for finding a class of said targetspeech based on said class taps; and learning means for carrying outlearning so that the prediction errors of the prediction values of thespeech of high sound quality obtained on carrying out predictivecalculations using said tap coefficients and the synthesized sound willbe statistically minimum, to find said tap coefficients from class toclass.
 11. The learning device according to claim 10 wherein saidlearning means carries out learning so that the prediction errors of theprediction values of the speech of high sound quality obtained oncarrying out one-dimensional linear predictive calculations using saidtap coefficients and the synthesized sound will be statisticallyminimum.
 12. The learning device according to claim 10 wherein saidclass tap extraction means extracts said class taps from said code andfrom said linear prediction coefficients and said residual signalsobtained on decoding said code.
 13. The learning device according toclaim 10 wherein said code is obtained on encoding the speech inaccordance with the CELP (Code Excited Linear Prediction Coding) system.14. A learning method for learning preset class taps usable for finding,by preset predictive calculations, prediction values of the speech ofhigh sound quality from the synthesized sound obtained on affordinglinear prediction coefficients and residual signals, generated from apreset code, to a speech synthesis filter, said speech of high soundquality being higher in sound quality than said synthesized sound, saidlearning device comprising: a class tap extraction step of extractingclass taps from said code, said class taps being used for classifyingsaid speech of high sound quality, as target speech, the predictionvalues of which are to be found; a classification step of finding aclass of said target speech based on said class taps; and a learningstep of carrying out learning so that the prediction errors of theprediction values of the speech of high sound quality obtained oncarrying out predictive calculations using said tap coefficients and thesynthesized sound will be statistically minimum, to find said tapcoefficients from class to class.
 15. A recording medium having recordedthereon a program for having a computer execute learning processing oflearning preset class taps usable for finding, by preset predictivecalculations, prediction values of the speech of high sound quality fromthe synthesized sound obtained on affording linear predictioncoefficients and residual signals, generated from a preset code, to aspeech synthesis filter, said speech of high sound quality being higherin sound quality than said synthesized sound, said learning devicecomprising: a class tap extraction step of extracting class taps fromsaid code, said class taps being used for classifying said speech ofhigh sound quality, as target speech, the prediction values of which areto be found; a classification step of finding a class of said targetspeech based on said class taps; and a learning step of carrying outlearning so that the prediction errors of the prediction values of thespeech of high sound quality obtained on carrying out predictivecalculations using said tap coefficients and the synthesized sound willbe statistically minimum, to find said tap coefficients from class toclass.
 16. A data processing device for generating, from a preset code,filter data to be afforded to a speech synthesis filter adapted forsynthesizing the speech based on linear prediction coefficients and apreset input signal, comprising: code decoding means for decoding saidcode to output decoded filter data; acquisition means for acquiringpreset tap coefficients as found by carrying out learning; andprediction means for carrying out preset predictive calculations, usingsaid tap coefficients and the decoded filter data, to find predictionvalues of said filter data, to send the so found prediction values tosaid speech synthesis filter.
 17. The data processing device accordingto claim 16 wherein said prediction means carries out one-dimensionallinear predictive calculations to find prediction values of said filterdata.
 18. The data processing device according to claim 16 wherein saidacquisition means acquires said tap coefficients from storage meansholding said tap coefficients.
 19. The data processing device accordingto claim 16 further comprising: prediction tap extraction means forextracting prediction taps from said decoded filter data, saidprediction taps being usable along with said tap coefficients forpredicting said filter data, as filter data of interest, the predictionvalues of which are to be found, said prediction means carrying outpredictive calculations using said prediction tap and tap coefficients.20. The data processing device according to claim 19 further comprising:class tap extraction means for extracting class taps from said decodedfilter data, said class taps being used for sorting said filter data ofinterest to one of a plurality of classes, by way of classification, andclassification means for finding the class for said filter data ofinterest, based on said class taps; said prediction means carrying outpredictive calculations using said prediction taps and said tapcoefficients associated with the class of said filter data of interest.21. The data processing device according to claim 19 further comprising:class tap extraction means for extracting class taps from said code,said class tap being used for sorting said filter data of interest toone of a plurality of classes, by way of classification, andclassification means for finding the class for said filter data ofinterest, based on said class tap; said prediction means carrying outpredictive calculations using said prediction taps and said tapcoefficients associated with the class of said filter data of interest.22. The data processing device according to claim 21 wherein said classtap extraction means extracts said class taps from both said code andsaid decoded filter data.
 23. The data processing device according toclaim 16 wherein said tap coefficients have been obtained on carryingout learning so that the prediction errors of the predicted values ofsaid filter data obtained on carrying out preset predictive calculationsemploying said tap coefficients and decoded filter data will bestatistically minimum.
 24. The data processing device according to claim16 wherein said filter data is at least one or both of said input signaland said linear prediction coefficients.
 25. The data processing deviceaccording to claim 16 further comprising: said speech synthesis filter.26. The data processing according to claim 16 wherein said code isobtained on encoding the speech in accordance with the CELP (CodeExcited Linear Prediction Coding) system.
 27. A data processing methodfor generating, from a preset code, filter data to be afforded to aspeech synthesis filter adapted for synthesizing the speech based onlinear prediction coefficients and on a preset input signal, comprising:a code decoding step of decoding said code to output decoded filterdata; an acquisition step of acquiring preset tap coefficients as foundby carrying out learning; and a prediction step of carrying out presetpredictive calculations, using said tap coefficients and the decodedfilter data, to find prediction values of said filter data, to send theso found prediction values to said speech synthesis filter.
 28. Arecording medium having recorded thereon a program for having a computerexecute data processing of generating, from a preset code, filter datato be afforded to a speech synthesis filter adapted for synthesizing thespeech based on linear prediction coefficients and a preset inputsignal, comprising: a code decoding step of decoding said code to outputdecoded filter data; an acquisition step of acquiring preset tapcoefficients as found by carrying out learning; and a prediction step ofcarrying out preset predictive calculations, using said tap coefficientsand the decoded filter data, to find prediction values of said filterdata, to send the so found prediction values to said speech synthesisfilter.
 29. A learning device for learning preset tap coefficientsusable for finding, by predictive calculations from a code associatedwith filter data to be applied to a speech synthesis filter whichsynthesizes the speech based on linear prediction coefficients and apreset input signal, prediction values of said filter data, comprising:code decoding means for decoding the code corresponding to filter datato output decoded filter data; and learning means for carrying outlearning so that prediction errors of prediction values of said filterdata obtained on carrying out predictive calculations using said tapcoefficients and decoded filter data will be statistically smallest tofind said tap coefficients.
 30. The learning device according to claim29 wherein said learning means performs the learning so that theprediction errors of the prediction values of said filter data obtainedon carrying out one-dimensional linear predictive calculations usingsaid tap coefficients and the decoded filter data will be statisticallysmallest.
 31. The learning device according to claim 29 furthercomprising: predictive tap extraction means for extracting from saiddecoded filter data prediction taps used along with said tapcoefficients for predicting said filter data, the prediction values ofwhich are to be found, as said filter data of interest; said learningmeans effecting learning so that the prediction errors of predictionvalues of said filter data obtained on carrying out predictivecalculations using said prediction taps and tap coefficients will bestatistically smallest.
 32. The learning device according to claim 31further comprising: class tap extraction means for extracting a classtap from said decoded filter data, said class tap being used for sortingsaid filter data of interest to one of a plurality of classes, by way ofclassification, and classification means for finding the class for saidfilter data of interest, based on said class tap; said learning meansperforming learning so that the prediction errors of prediction valuesof said filter data obtained on carrying out predictive calculationsusing said prediction taps and said tap coefficients associated with theclass of said filter data of interest will be statistically smallest.33. The learning device according to claim 31 further comprising: classtap extraction means for extracting a class tap from said code, saidclass tap being used for sorting said filter data of interest to one ofa plurality of classes, by way of classification, and classificationmeans for finding the class for said filter data of interest, based onsaid class taps; said learning means performing learning so that theprediction errors of prediction values of said filter data obtained oncarrying out predictive calculations using said prediction taps and tapcoefficients will be statistically smallest.
 34. The learning deviceaccording to claim 33 wherein said class tap extraction means extractssaid class taps from both said code and said decoded filter data. 35.The learning device according to claim 29 wherein said filter data is atleast one or both of said input signal and said linear predictioncoefficients.
 36. The learning device according to claim 29 wherein saidcode is obtained on encoding the speech in accordance with the CELP(Code Excited Linear Prediction Coding) system.
 37. A learning methodfor learning preset tap coefficients usable for finding, by predictivecalculations from a code associated with filter data to be applied to aspeech synthesis filter which synthesizes the speech based on linearprediction coefficients and a preset input signal, prediction values ofsaid filter data, comprising: a code decoding step of decoding the codecorresponding to filter data to output decoded filter data; and alearning step of carrying out learning so that the prediction errors ofprediction values of said filter data obtained on carrying outpredictive calculations using said tap coefficients and decoded filterdata will be statistically smallest to find said tap coefficients.
 38. Arecording medium having recorded thereon a program for having a computerexecute learning processing of learning preset tap coefficients usablefor finding, by predictive calculations from a code associated withfilter data to be applied to a speech synthesis filter which synthesizesthe speech based on linear prediction coefficients and a preset inputsignal, prediction values of said filter data, comprising: a codedecoding step of decoding the code corresponding to filter data tooutput decoded filter data; and a learning step of carrying out learningso that the prediction errors of prediction values of said filter dataobtained on carrying out predictive calculations using said tapcoefficients and decoded filter data will be statistically smallest tofind said tap coefficients.
 39. A speech processing device for findingprediction values of the speech of high sound quality from thesynthesized sound obtained on affording linear prediction coefficientsand residual signals, generated from a preset code, to a speechsynthesis filter, said speech of high sound quality being higher insound quality than said synthesized sound, comprising: prediction tapextraction means for extracting prediction taps usable for predictingthe speech of high sound quality, as target speech, the predictionvalues of which are to be found, class tap extraction means forextracting class taps, usable for sorting the target speech to one of aplurality of classes, by way of classification, from said synthesizedsound, said code or the information derived from said code; acquisitionmeans for acquiring said tap coefficients associated with the class ofsaid target speech from the tap coefficients as found on learning fromone class to another; and prediction means for finding the predictionvalues of said target speech using said prediction taps and said tapcoefficients associated with the class of said target speech.
 40. Thedata processing device according to claim 39 wherein said predictionmeans effects one-dimensional linear predictive calculations, using saidprediction taps and tap coefficients, to find prediction values of saidtarget speech.
 41. The data processing device according to claim 39wherein said acquisition means acquires said tap coefficients of theclass associated with said target speech from storage means holding saidtap coefficients from class to class.
 42. The data processing deviceaccording to claim 39 wherein said prediction tap extraction means orclass tap extraction means extracts said prediction taps or class tapfrom said synthesized sound, said code or the information derived fromsaid code.
 43. The data processing device according to claim 39 whereinsaid tap coefficients have been obtained on carrying out learning sothat the prediction errors of the predicted values of said speech ofhigh sound quality obtained on carrying out preset predictivecalculations employing said prediction taps and tap coefficients will bestatistically minimum.
 44. The data processing device according to claim39 further comprising: a speech synthesis filter.
 45. The dataprocessing device according to claim 39 wherein said code has beenobtained on coding the speech with CELP (Code Excited Linear PredictionCoding) system.
 46. A speech processing method for finding predictionvalues of the speech of high sound quality from the synthesized soundobtained on affording linear prediction coefficients and residualsignals, generated from a preset code, to a speech synthesis filter,said speech of high sound quality being higher in sound quality thansaid synthesized sound, comprising: a prediction tap extraction step ofextracting prediction taps usable for predicting the speech of highsound quality, as target speech, the prediction values of which are tobe found, from said synthesized sound, said code or the informationderived from said code; a class tap extraction step of extracting aclass tap, usable for sorting the target speech to one of a plurality ofclasses, by way of classification, from said synthesized sound, saidcode or the information derived from said code; a classification step offinding the class of said target speech based on said class tap; anacquisition step of acquiring said tap coefficients associated with theclass of said target speech from the tap coefficients as found onlearning from one class to another; and a prediction step of finding theprediction values of said target speech using said prediction taps andsaid tap coefficients associated with the class of said target speech.47. A recording medium having recorded thereon a program for having acomputer execute speech processing of finding prediction values of thespeech of high sound quality from the synthesized sound obtained onaffording linear prediction coefficients and residual signals, generatedfrom a preset code, to a speech synthesis filter, said speech of highsound quality being higher in sound quality than said synthesized sound,comprising: a prediction tap extraction step of extracting predictiontaps usable for predicting the speech of high sound quality, as targetspeech, the prediction values of which are to be found, a class tapextraction step of extracting class taps, usable for sorting the targetspeech to one of a plurality of classes, by way of classification, fromsaid synthesized sound, said code or the information derived from saidcode; an acquisition step of acquiring said tap coefficients associatedwith the class of said target speech from the tap coefficients as foundon learning from one class to another; and a prediction step of findingthe prediction values of said target speech using said prediction tapsand said tap coefficients associated with the class of said targetspeech.
 48. A learning device for learning preset tap coefficientsusable for finding, by preset predictive calculations, prediction valuesof the speech of high sound quality, from the synthesized sound obtainedon affording linear prediction coefficients and residual signals,generated from a preset code, to a speech synthesis filter, said speechof high sound quality being higher in sound quality than saidsynthesized sound, comprising: prediction tap extraction means forextracting prediction taps usable in predicting the speech of high soundquality, as target speech, the prediction values of which are to befound, from said synthesized sound, said code or the information derivedfrom said code; class tap extraction means for extracting class tapsusable for sorting the target speech to one of a plurality of classes,by way of classification, from said synthesized sound, said code or theinformation derived from said code; classification means for finding theclass of said target speech based on said class taps; and learning meansfor carrying out learning so that the prediction errors of predictionvalues of said speech of high sound quality, obtained on carrying outpredictive calculations using said tap coefficients and said predictiontaps, will be statistically smallest.
 49. The learning device accordingto claim 48 wherein said learning means carries out learning so that theprediction errors of the prediction values of the speech of high soundquality obtained on carrying out the one-dimensional linear predictivecalculations using said tap coefficients and the prediction taps will bestatistically smallest.
 50. The learning device according to claim 48wherein said prediction tap extraction means or class tap extractionmeans extract said prediction taps or the class taps from thesynthesized sound, said code and the information derived from said code.51. The learning device according to claim 48 wherein said code has beenobtained on coding the speech with CELP (Code Excited Linear PredictionCoding) system.
 52. A learning method for learning preset tapcoefficients usable for finding, by preset predictive calculations,prediction values of the speech of high sound quality, from thesynthesized sound obtained on affording linear prediction coefficientsand residual signals, generated from a preset code, to a speechsynthesis filter, said speech of high sound quality being higher insound quality than said synthesized sound, comprising: a prediction tapextraction step of extracting prediction taps usable in predicting thespeech of high sound quality, as target speech, the prediction values ofwhich are to be found, from said synthesized sound, said code or theinformation derived from said code; a class tap extraction step ofextracting a class tap usable for sorting the target speech to one of aplurality of classes, by way of classification, from said synthesizedsound, said code or the information derived from said code; aclassification step of finding the class of said target speech based onsaid class tap; and a learning step of carrying out learning so that theprediction errors of prediction values of said speech of high soundquality, obtained on carrying out predictive calculations using said tapcoefficients and said prediction taps, will be statistically smallest,to find said tap coefficients.
 53. A recording medium having recordedthereon a program for having a computer execute learning processing oflearning preset tap coefficients usable for finding, by presetpredictive calculations, prediction values of the speech of high soundquality, from the synthesized sound obtained on affording linearprediction coefficients and residual signals, generated from a presetcode, to a speech synthesis filter, said speech of high sound qualitybeing higher in sound quality than said synthesized sound, comprising:prediction tap extraction step of extracting prediction taps usable inpredicting the speech of high sound quality, as target speech, theprediction values of which are to be found, from said synthesized sound,said code or the information derived from said code; a class tapextraction step of extracting a class tap usable for sorting the targetspeech to one of a plurality of classes, by way of classification, fromsaid synthesized sound, said code or the information derived from saidcode; a classification step of finding the class of said target speechbased on said class tap; and a learning step of carrying out learning sothat the prediction errors of prediction values of said speech of highsound quality, obtained on carrying out predictive calculations usingsaid tap coefficients and said prediction taps, will be statisticallysmallest, to find said tap coefficients.